Content uploaded by Daniel Kloeck

Author content

All content in this area was uploaded by Daniel Kloeck on Nov 10, 2020

Content may be subject to copyright.

A Hand-Drawn Barcode

by

Daniel Klöck

Dipl.-Inf., BTU Cottbus (2009)

Submitted to the Department of Computer Science

in partial fulﬁllment of the requirements for the degree of

Master in Artiﬁcial Intelligence and Deep Learning

at the

UNIVERSITY OF ALCALÁ

October 2020

©Daniel Klöck, MMXX. All rights reserved.

The author hereby grants to UAH permission to reproduce and to

distribute publicly paper and electronic copies of this thesis document

in whole or in part in any medium now known or hereafter created.

Author..............................................................

Department of Computer Science

October 18, 2020

Accepted by.........................................................

José Ignacio Olmeda Martos

Chairman, Department Committee on Graduate Theses

I hereby conﬁrm that this thesis was written independently by myself without the

use of any sources beyond those cited, and all passages and ideas taken from other

sources are cited accordingly.

2

A Hand-Drawn Barcode

by

Daniel Klöck

Submitted to the Department of Computer Science

on October 18, 2020, in partial fulﬁllment of the

requirements for the degree of

Master in Artiﬁcial Intelligence and Deep Learning

Abstract

By studying how characters from diﬀerent alphabets are written, an adequate set of

substructures that can be drawn swiftly and eﬀortlessly is identiﬁed. Following, a

way to compose a hand-drawn barcode is presented, optimizing information density

to increase the amount of contained data while being easy and fast to draw. A

recognition procedure will be deﬁned and diﬀerent models for barcode detection and

substructure classiﬁcation are presented and evaluated. Possible value encoding error

sources are examined and the recognition procedure is reviewed and tested for their

accuracy. Finally, the probability of a successful recognition will be studied and

improved by choosing a suitable forward error correction method.

3

4

Acknowledgements

I am extremely grateful to my beloved wife Aleksandra Kucharczuk-Klöck for her

care and support.

My sincere thanks also goes to all members of the UAH’s Master in Artiﬁcial

Intelligence and Deep Learning course of 2019-2020 for all the help and discussions.

Especially to Ming Lei, Fabrice Aubert, Gianni Santinelli, Genís Virgili Sánchez,

Micheline Pollock, Irene van den Broek, Jesús Chávez and Brian Naranja.

Finally, I thank everyone that helped generate the hand-drawn barcode dataset.

Especially, Sandie Klöck, Maja Kucharczuk and Jorge Gangoso Klöck.

5

6

Contents

1 Introduction 13

2 Identifying Symbols and Structure 15

2.1 Deﬁning Drawing Complexity . . . . . . . . . . . . . . . . . . . . . . 15

2.1.1 Simplicity by Similarity . . . . . . . . . . . . . . . . . . . . . 15

2.1.2 Simplicity by Speed . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2 ExploringSymbols ............................ 16

2.3 ABarcodeProposal............................ 18

3 Detecting the Barcode and Extracting its Value 21

3.1 Detecting the Barcode and its Parts . . . . . . . . . . . . . . . . . . . 22

3.1.1 Faster R-CNN ResNet50 V1 640x640 . . . . . . . . . . . . . . 23

3.1.2 CenterNet HourGlass104 512x512 . . . . . . . . . . . . . . . . 24

3.1.3 EﬃcientDet D2 768x768 . . . . . . . . . . . . . . . . . . . . . 25

3.1.4 Evaluation............................. 26

3.2 Calculate the Rotation and Extract the Bars . . . . . . . . . . . . . . 27

3.3 ClassifyingtheBars............................ 28

4 Value Encoding and Decoding 33

4.1 BitOrder ................................. 33

4.2 ErrorSources ............................... 34

4.3 Error Detection and Correction . . . . . . . . . . . . . . . . . . . . . 35

4.3.1 Error Detection and Correction using Linear Codes . . . . . . 36

7

4.3.2 Evaluation of Error Correction by Exploiting Model Conﬁdence 40

5 Future Lines of Research 41

A Additional Information 43

8

List of Figures

2-1 Omniglot characters with their Omniglot ids sorted by median time

spendwriting................................ 17

2-2 Median speed compared to median number of strokes of characters. . 18

2-3 All possible single substructure bars and their value while representing

a decimal number between 0 and 220 −1................. 19

2-4 A bar that contains all possible substructures. . . . . . . . . . . . . . 20

3-1 Faster R-CNN structure. Image from [1]. . . . . . . . . . . . . . . . . 23

3-2 Example results of the barcode and parts detection model using “Faster

R-CNN ResNet50 V1 640x640” on validation images. . . . . . . . . . 24

3-3 Architecture of CenterNet. Image from [2]. . . . . . . . . . . . . . . . 25

3-4 Architecture of EﬃcientDet. Image from [3]. . . . . . . . . . . . . . . 26

3-5 Result examples of extracted and rotated bars. . . . . . . . . . . . . . 28

3-6 Bar with ﬂipped counterparts for data augmentation. . . . . . . . . . 28

3-7 Image of a Bar with a corrected substructure. . . . . . . . . . . . . . 30

4-1 Order of the bits in a bar. . . . . . . . . . . . . . . . . . . . . . . . . 33

9

10

List of Tables

3.1 Evaluation of experimented barcode detection models. . . . . . . . . . 27

3.2 Reached accuracy with lowest validation loss on substructure classiﬁ-

cationmodels................................ 29

3.3 Confusion values of classiﬁed substructures using EﬃcientNetB2†. . . 31

4.1 Error rates of popular 1D-barcodes (with 95% conﬁdence) [4]. . . . . 35

4.2 Linear code choice recommendations for diﬀerent number of bars. . . 38

11

12

Chapter 1

Introduction

Many researchers have worked on improving the recognition accuracy of mathematical

symbols [5], digits [6, 7], sketches [8], text [9], patterns [10] and other symbols [11,

12]. However, little research has been published to determine what impact diﬀerent

patterns or set of symbols could have on the recognition diﬃculty. This makes the

task of sketching the right set of symbols for later detection very complicated, more

so, if the person does not have in-depth knowledge of the system that will be used

for recognition. Currently, if hand-drawn symbols need to be recognized, there is no

guide as of which symbols to use.

In Section 2.1 the identiﬁcation of symbols with low drawing complexity will be

made possible by declaring what it means for a symbol to be easy to draw. Then,

in Section 2.2, the writing style used for characters of the Omniglot dataset [13] will

be examined. Subsequently, by using the deﬁnitions from Section 2.1, symbols that

are quick and easy to draw will be detected. In Section 2.3, those symbols will be

organized in a structure, creating a novel barcode that can be hand-drawn. That

structure will optimize information density and minimize hand-drawing complexity.

In Chapter 3, a recognition procedure for that kind of barcodes will be presented.

This procedure will need an object detection model that can detect the barcode and its

parts. That model will be chosen, trained and evaluated in Section 3.1. The procedure

will also require a multi-label bar classiﬁcation model, which will be selected from a

list of modern classiﬁcation models and evaluated in Section 3.3.

13

In Chapter 4, encoding, decoding, sources of error as well as expected accuracies

will be discussed. Section 4.3 will recommend adequate forward error correction as

well as error detection mechanisms to improve the accuracy by using redundancy bits.

Note that Appendix A contains information about how to acquire source code,

conﬁguration ﬁles and images related to this thesis.

14

Chapter 2

Identifying Symbols and Structure

2.1 Deﬁning Drawing Complexity

As a ﬁrst step in the search for a hand-drawn barcode, drawing complexity was

explored to make sure that every part of the barcode can be easily and accurately

drawn by anyone. Unfortunately, to my knowledge, there is no published research

about ﬁnding or comparing the complexity of drawing symbols. To overcome this

lack of previous work, two axioms will be formulated that will make it possible to

discern symbols that are easy to draw from those that are more complex.

2.1.1 Simplicity by Similarity

Axiom 1 (Axiom of simplicity by similarity).A symbol is easier to draw, if it is

usually replicated with more accuracy.

Axiom 1 means that if the same symbol is drawn several times by one or multiple

persons, the similarity of the resulting images will be higher with an easy symbols than

with a complex symbol. To use this approach, an image similarity measure must be

selected. There are several possibilities, but the assumptions taken by the techniques

are critical and may lead to erroneous results when calculating drawing complexity.

For example, we cannot assume that we are analysing variations of the same image.

Some options of deﬁning the image similarity would be the Image Euclidian Distance

15

(IMED) [14], the Structural Similarity Index (SSIM) [15] or the Modiﬁed Hausdorﬀ

Distance [16] . Another option would be using the entropy of the image [17], which

may be based on histogram values [18]. However, in this case, pixel positions may

play an important role and should not be ignored.

2.1.2 Simplicity by Speed

The second method is based on following premise:

Axiom 2 (Axiom of simplicity by speed).A symbol is easier to draw, if it is usual ly

drawn in a shorter time.

Using this method would not only give more reliable results, since no other as-

sumptions or deﬁnitions are needed, but it would also yield symbols that reduce the

time needed to draw the barcode, which would also be a desirable property. Due to

its simplicity and relation to drawing speed, Axiom 2 was used for searching for a set

of symbols that would be easy enough for everyone.

2.2 Exploring Symbols

To explore the writing speed of symbols from diﬀerent alphabets, the Omniglot

dataset [13] was used. Its Github page [19] describes its content as follows:

“It contains 1623 diﬀerent handwritten characters from 50 diﬀerent alphabets. Each

of the 1623 characters was drawn online via Amazon’s Mechanical Turk by 20 diﬀer-

ent people. Each image is paired with stroke data, a sequences of [x,y,t] coordinates

with time (t) in milliseconds.”

By sorting the characters of all alphabets by median writing speed, the symbols

that are easiest and fastest to draw could be found. Note that taking into considera-

tion Axiom 1, these will also be the simplest symbols to replicate accurately. Thus,

increasing the probability of a correct recognition.

16

In Figure 2-1, the 100 symbols that were drawn the fastest are shown. As expected,

the top symbols are geometric primitives i.e. dot, line (with diﬀerent rotations),

diﬀerent curves and circle. Followed by shapes that could be seen as combinations

of those primitives, such as ‘∧’ (two lines), ‘:’ (two dots), ‘!’ (line and dot) and ‘⊥’

(two lines).

Figure 2-1: Omniglot characters with their Omniglot ids sorted by median time spend

writing.

If we also take into account, that we tend to misperceive curvature, direction

and length due to the nature of our eye movements [20], curves and structures where

direction and length are important (such as would be the case when drawing a barcode

based on modules similar to code128 based symbology [21]) should be discarded as

well.

By comparing the median speed to the median number of strokes of the characters

(see Figure 2-2), a clear tendency can be observed that shows that the more strokes

a symbol needs, the more time it will take to draw it.

Knowing these ﬁndings, it seems that a set of symbols that are easy and fast to

draw should consist of symbols based on circles, dots and lines with as few occurrences

of them as possible.

Further, a symbol should consist of substructures, since we can increase the num-

17

Figure 2-2: Median speed compared to median number of strokes of characters.

ber of objects it can represent exponentially by additional strokes instead of linearly

by ﬁnding a new symbol. For example, a symbol that may or may not contain 15

strokes can represent 215 = 32,768 diﬀerent objects, ﬁnding that amount of symbols

that are easy and fast to draw may be much more complicated.

2.3 A Barcode Proposal

I propose a hand-drawn barcode, that has as primary goal, the optimization of draw-

ing speed, accuracy and information density. The barcode will be drawn on a straight

horizontal line starting and ending with upwards facing lines. This will be enough to

understand the direction of the barcode, since it will always be read left to right. It

will contain a number of vertical lines (that will be called “bars”) that serve as the

ground structure for sketching the symbols that represent the data. Each bar is made

of maximally 10 additional lines.

Each bar may or may not contain any subset of 20 diﬀerent substructures, which

amounts to 20 bits per bar. This means that one single bar can represent 220 =

1,048,576 diﬀerent objects when no error correction nor detection is included. In

Figure 2-3 you can see these 20 possible substructures with its values when the barcode

18

is used to represent a decimal number between 0 and 1,048,575.

Figure 2-3: All possible single substructure bars and their value while representing a

decimal number between 0 and 220 −1.

These substructures are combined to create new values, for example Figure 2-4

shows the symbols when all substructures are present. Note how lines on opposite

sides of the bar can be drawn with a single stroke resulting in less needed geometric

primitives. Further, since substructure lines are either at one end, touching the

horizontal line or at the middle of the vertical lines, misperceived length should not

be an issue.

When more than the number of objects that a single bar can represent are needed,

more bars can be attached resulting in a barcode that can represent up to 2bars·20

diﬀerent objects. Note that a dynamic barcode that can contain between 1 and 𝑛𝑏bars

can represent up to 𝑛𝑏

𝑥=1 2𝑥·20 diﬀerent objects if you distinguish between numbers

with diﬀerent number of leading zeroes, i.e. a barcode that contains a speciﬁc value

19

with one bar is diﬀerent to a barcode that contains the same value with 2 bars.

Tests with grid structures (2 dimensional bar positioning) have also been made,

but discarded because the user had to draw the grid with the correct size before adding

the substructures, while with this structure, line sizes can be corrected a posteriori.

Figure 2-4: A bar that

contains all possible

substructures.

Using dot substructures also was discarded due to user

feedback. It was perceived as diﬃcult to draw, since the

proper position was not understood. As [22] shows, we read

connected structures faster than unconnected ones, which

may suggest, that for us humans, it is harder to recognize the

relation between unconnected structures, thus rendering it

more complicated to ﬁnd the correct location for dots within

a symbol. Another disadvantage of dots, is that it is not pos-

sible to draw several of them with one stroke, which is a line

feature that this barcode exploits.

20

Chapter 3

Detecting the Barcode and

Extracting its Value

To be able to train an object detection model, a dataset of hand-drawn barcodes

with suﬃcient samples had to be generated ﬁrst. To collect the data, a “Hand-drawn

Barcode User Study” webpage [23] was implemented, using Python [24], Flask [25],

Skeleton [26] and Cloudinary [27] as main technologies and it was hosted on Heroku

[28]. On this webpage, any user willing to help with the data generation would get a

random barcode displayed, which had to be copied with pen and paper.

As a result, between August 27𝑡ℎ and 31𝑠𝑡, 149 hand-drawn barcodes were uploaded

from 23 unique IPs. After removing images where the full barcode was not visible,

144 images remained, containing exactly one barcode each. The website also named

the uploaded ﬁle with the code that the depicted bars describe (e.g. a barcode that

contains the numbers 5023 and 101 in 2 bars would be named “5023_101”) to make

the classiﬁcation of the bars easier. Finally, all images were manually annotated with

bounding boxes for contained barcode start symbol, barcode end symbol, bars and

complete barcodes using “RectLabel for object detection” [29].

Once a barcode has been generated and hand-drawn by a user, the next task

would be to recognize it’s encoded value. Even if a one-stage architecture could

probably yield good results if more samples were available, the usage of a multi-

stage architecture was chosen because it made it possible to use sample augmentation

21

without distortion and to give more homogeneous input images to the classiﬁer of the

bars. The chosen architecture would need to implement following execution steps:

1. Predict the bounding box of the full barcode, the single bars, as well as the

starting and ending symbols.

2. Calculate the rotation of the barcode by analizing the relative position of the

starting and ending symbols.

3. Extract the bars sorted by distance to the starting symbols and rotate them to

have a vertical position where the starting symbol would be to the left.

4. Classify the bars with a multi-label system where each of the bar’s bits corre-

spond to one class.

5. Calculate the value represented by the barcode using the decoding function.

The sections of this chapter will describe and evaluate the models and algorithms

that execute steps 1 through 4, as well as the methods followed to ﬁnd the right

candidates. The next chapter will propose an encoding and decoding technique.

3.1 Detecting the Barcode and its Parts

As seen in the introduction of this chapter, the ﬁrst step to determine the value that

is encoded in a barcode is to detect the barcode and its parts. To make the prediction

as accurate as possible, 3 object detection models were trained and evaluated using

the annotated barcode images to predict the position of the full barcode, the start of

the barcode symbol, the end of the barcode symbol and the bars. The chosen models

were “Faster R-CNN ResNet50 V1 640x640”, “CenterNet HourGlass104 512x512”

and “EﬃcientDet D2 768x768” from the Tensorﬂow 2 Object Detection Model Zoo

[30]. All models were pre-trained on the COCO dataset [31].

The conﬁgurations of these models were changed to search for 4 label classes, the

TPU capability was deactivated and the data augmentation options were set to ‘ran-

22

dom_rgb_to_gray’, ‘random_adjust_brightness’, ‘random_adjust_contrast’, ‘ran-

dom_adjust_hue’, ‘random_adjust_saturation’ and ‘random_distort_color’. Finally,

the ‘ﬁne_tune_checkpoint_type’ property was set to ‘detection’ (used when loading

models pre-trained on other detection tasks) for all models but for “CenterNet Hour-

Glass104 512x512”, since it needs a special setting of ‘ﬁne_tune’ (used when loading

the entire CenterNet feature extractor pre-trained on other tasks).

All models were locally trained on a GeForce GTX 1070 (Compute Capability

6.1) using the Tensorﬂow 2 Object Detection API with CUDA 10.1 for Windows 10

(cuda_10.1.243_426.00_win10), Protoc 3.13.0-win64 and CuDNN 7.6.5. The evalu-

ation was executed in parallel using the CPU.

3.1.1 Faster R-CNN ResNet50 V1 640x640

The “Faster R-CNN ResNet50 V1 640x640” model is based on the popular Faster

R-CNN [1] architecture, which is an object detection system composed of a deep fully

convolutional network that proposes regions and a Fast R-CNN detector [32], which

Figure 3-1: Faster R-

CNN structure. Im-

age from [1].

in turn is based on R-CNN [33]. The Fast R-CNN architec-

ture produces a convolutional feature map by processing the

input image with convolutional and max-pooling layers. The

region proposal network uses the feature maps to generate

an output of rectangular object proposals with an objectness

score. For each region proposal, a feature vector is extracted

from each feature map by a region of interest pooling layer.

These feature vectors are then passed to fully connected lay-

ers to produce two outputs, the position of the bounding box

and the softmax probability that the region of interest con-

tains one of the target classes. This concrete implementation

of the model uses an input image with a resolution of 640 by

640 RGB pixels. As the backbone, a 50 layer deep residual network [34] has been used

as opposed to the original ZF-NET [35] and VGG-NET [36] (which used to be pre-

trained on ImageNet [37]) [38]. The advantage of ResNet over VGG is that it is bigger,

23

which means that it has more capacity to learn what is needed. Further, ResNet uses

residual connections and batch normalization, which was not invented when VGG

was ﬁrst released. According to the Tensorﬂow documentation, this model reached a

COCO mAP1of 29.3 and a mean time of running inference of 53 milliseconds.

Figure 3-2: Example results of the barcode and parts detection model using “Faster

R-CNN ResNet50 V1 640x640” on validation images.

3.1.2 CenterNet HourGlass104 512x512

One property that distinguishes Faster R-CNN from CenterNet [2] is that the latter

is a one-stage object detector. Another diﬀerence is that it detects each object as a

1mean Average Precision, an accuracy metric for object detectors.

24

triplet, instead of a pair, using a keypoint estimator to ﬁnd center points and regress to

all other object properties. The center point based approach, is said to be end-to-end

diﬀerentiable, simpler, faster, and more accurate than corresponding bounding box

based detectors [39]. The idea behind this model, is that if a predicted bounding box

would have a high IoU2with the ground truth, it would also be highly probable for

the center point to be in the center region of the bounding box, and vice versa, which

enables a more eﬃcient way of searching for objects by using their center points.

As seen in Figure 3-3, the main part of this architecture consists of two modules

named cascade corner pooling and center pooling, which play the roles of enriching

information collected by the top-left and bottom-right corners and providing more

recognizable information at the central regions.

Figure 3-3: Architecture of CenterNet. Image from [2].

As the backbone, HourGlass-104 was used, which yielded the best keypoint es-

timation performance in the evaluation done in [39]. This speciﬁc implementation

uses an input image with a resolution of 512 by 512 RGB pixels. According to the

Tensorﬂow documentation, this model reached a COCO mAP of 41.9 and a mean

time of running inference of 70 milliseconds.

3.1.3 EﬃcientDet D2 768x768

EﬃcientDet D2 employs an EﬃcientNet [40] B2 network as the backbone, a Weighted

Bi-directional Feature Pyramid Network (also known as BiFPN) with 112 channels

2Intersection over Union, an evaluation metric used to measure the accuracy of an object detector,

calculated as the area of overlap divided by the area of union of the predicted and the ground truth

bounding boxes.

25

and 5 layers, 3 box/class layers and an expected input size of 768 by 768 RGB

pixels. EﬃcientDet’s BiFPN incorporates multi-level feature fusion allowing data to

ﬂow in both directions, top-down and bottom-up, while using regular and eﬃcient

connections. EﬃcientNet is a Convolutional Network, developed by Google’s Brain

team, that seeks to optimize downstream performance given free range over depth,

width and resolution while staying within the constraints of target memory and target

FLOPs [41].

According to the Tensorﬂow documentation, this model reached a COCO mAP

of 41.8 and a mean time of running inference of 67 milliseconds.

Figure 3-4: Architecture of EﬃcientDet. Image from [3].

3.1.4 Evaluation

After training the models until a clear overﬁtting pattern emerged, since it should

generalize better than any other, the step with the lowest total loss on the evaluation

dataset was taken. In Table 3.1 it can be seen that the Faster R-CNN model could

reach the best mean Average Precission for all explored IoU thresholds on that step.

For the ﬁrst version of a barcode detector, accuracy seemed more important than

speed and thus, speed was ignored as long as it was within the limit of usable software,

which was the case for all of these models.

26

Model Total Loss mAP@.50:.95IoU mAP@.50IoU mAP@.75IoU

Faster R-CNN 0.5845 0.68 0.98 0.7619

CenterNet 1.714 0.573 0.9504 0.6086

EﬃcientDet 0.3475 0.631 0.9681 0.7066

Table 3.1: Evaluation of experimented barcode detection models.

Note that the total losses cannot be compared between models, since the loss

functions are diﬀerent from each other and thus, have diﬀerent meanings.

Some models that were part of the initial proposal of this thesis did not become

candidates for diﬀerent reasons, e.g. SSD models use an aspect-ratio which would

probably not work very well with barcodes that have a dynamic width but a rather

static height and YOLO trades accuracy for speed.

3.2 Calculate the Rotation and Extract the Bars

After the model detects the barcode and its parts, the angle of the barcode is calcu-

lated as:

𝑎𝑛𝑔𝑙𝑒 =𝑎𝑡𝑎𝑛2(𝑦−𝑦′, 𝑥 −𝑥′)

where:

𝑦= the y coordinate of the center of the ending symbol’s bounding box.

𝑦′= the y coordinate of the center of the starting symbol’s bounding box.

𝑥= the x coordinate of the center of the ending symbol’s bounding box.

𝑥′= the x coordinate of the center of the starting symbol’s bounding box.

Or in other words, the angle of the barcode is calculated as the angle between

the starting and ending symbol center points. Note that an angle of 0 corresponds

to a perfectly aligned image where no rotation is needed, which means that the start

symbol can be found to the left of the barcode and the ending symbol to the right at

the same height.

27

Then the bars are extracted, meaning that new images are created for bounding

Figure 3-5: Result

examples of extracted

and rotated bars.

box areas that corresponded to bars of the input image. Fi-

nally, the extracted images are rotated by the barcode’s angle

and ordered by distance to the starting symbol. This will en-

sure that the classiﬁcation model receives homogeneous im-

ages, namely bars with the same rotation and only the part of

the image that is needed for the classiﬁcation, which should

increase the accuracy. Additionally, the images can be ﬂipped

vertically, horizontally and both to produce new bar images.

These new images would contain diﬀerent bits than the original image and can be

used for training or evaluation. However, the original and the augmented images were

kept together either in the training or in the validation dataset to reduce correlation

between these datasets.

(a) Extracted

bar, representing

923680.

(b) Horizontally

ﬂipped, represent-

ing 861200.

(c) Vertically

ﬂipped, represent-

ing 16775.

(d) Horizontally

and vertically

ﬂipped, represent-

ing 33355.

Figure 3-6: Bar with ﬂipped counterparts for data augmentation.

3.3 Classifying the Bars

Once the bars are ordered and rotated, they can be classiﬁed to ﬁnd which bits are

active. The basic idea is to give each substructure a class label and use a classiﬁer to

predict which of those classes are represented in the bar image. This means that the

classiﬁer model has to solve a multi-label classiﬁcation problem.

28

Model Training Accuracy Validation Accuracy

VGG16 0.9758 0.8683

VGG16* 0.9718 0.9420

VGG19 0.9663 0.8555

VGG19* 0.9719 0.9322

EﬃcientNetB1 0.9264 0.8842

EﬃcientNetB1* 0.9759 0.9528

EﬃcientNetB1** 0.9637 0.9255

EﬃcientNetB1†0.9881 0.9615

EﬃcientNetB2 0.9281 0.8703

EﬃcientNetB2* 0.9771 0.9463

EﬃcientNetB2†0.9928 0.9723

EﬃcientNetB3 0.9000 0.8430

ResNet50 0.9224 0.8050

ResNet50* 0.9798 0.9345

ResNet101 0.8606 0.8045

ResNet101* 0.9041 0.8805

DenseNet121 0.7657 0.7115

* Removing the last block of layers

** Removing the last 2 blocks of layers

†Training the last block of layers

Table 3.2: Reached accuracy with lowest validation loss on substructure classiﬁcation

models.

For experimentation, multiple classiﬁer models based on the pre-trained models

from tensorﬂow’s keras section, speciﬁcally keras.applications were used. In all

cases, the input shape has been changed to 450 by 100 RGB to make it more similar

to the output shape of the detected bars. Addditionally, the top3has been dropped

since a new output format is needed. To make up for the removed top a ﬂattening

3The top refers to the ﬂattening and fully connected layers stacked on top of the models.

29

layer and a fully connected layer with a sigmoid activation function ending in 20

output nodes (one for each substructure in a bar) have been appended to the top.

Some of the experiments only trained the top dense layer of the model. Others,

additionally allowed training the last block of layers of the pre-trained model to train

the bigger features for barcodes. A third kind of experiments removed the last block

of layers altogether and appended the new top to the previous block of layers to make

sure that the dense network was not inﬂuenced by the bigger features of the image

dataset used to pre-train the network, since they probably would be very diﬀerent.

All models were compiled using binary cross-entropy as loss function and Adam [42]

as optimizer.

From the initial dataset of 144 images, 495 bars were automatically extracted,

augmented to 1980 bars4and labelled with their corresponding value of active bits.

Those images were split into 1780 training images and 200 validation images. Ad-

ditionally, the training images have been augmented with an ImageDataGenerator

allowing to add a subset of following transformations: channel shift, brightness shift,

shear angle, rotation, zoom.

Figure 3-7: Image of

a Bar with a corrected

substructure.

Table 3.2 shows the reached accuracy on the training and

validation images in the step with the lowest loss on the val-

idation data, since it should generalize best. As shown, the

pre-trained EfficientNetB2 model allowing to train the last

block of layers had the best accuracy on the validation data,

reaching an accuracy of 97.23%.

Table 3.3 shows the confusion values of the single sub-

structures as well as the summed results over the 20 sub-

structures for the 200 validation images. It can be seen that all bits have similar

confusion percentages and are behaving as expected.

Further, it could be observed that the model adapted to unforeseen user behaviour.

For example, the model learned how users ﬁx incorrectly drawn lines, such as the one

a user corrected in the bar seen in Figure 3-7.

4Using the technique described in Section 3.2

30

True Positive False Positive True Negative False Negative

Bit 1 117 4 79 0

Bit 2 112 6 77 2

Bit 3 101 6 91 2

Bit 4 101 3 94 2

Bit 5 89 7 104 0

Bit 6 87 4 107 2

Bit 7 91 1 102 6

Bit 8 95 3 100 2

Bit 9 100 3 96 1

Bit 10 99 7 92 2

Bit 11 98 3 96 3

Bit 12 101 5 94 0

Bit 13 94 0 103 3

Bit 14 95 3 100 2

Bit 15 86 3 108 3

Bit 16 88 3 108 1

Bit 17 102 2 95 1

Bit 18 101 2 95 2

Bit 19 115 3 80 2

Bit 20 114 1 82 3

Summed 1986 69 1903 42

Table 3.3: Confusion values of classiﬁed substructures using EﬃcientNetB2†.

31

32

Chapter 4

Value Encoding and Decoding

As mentioned in earlier chapters, each substructure of a bar represents a bit. If the

substructure is represented in the bar, the bit becomes 1, if not, it is a 0.

Due to the low data density compared to digital barcodes and the variable clas-

siﬁcation error, I suggest to develop customized encoding and decoding mechanisms

depending on the application and quality of the models.

In this chapter, some possible encoding, decoding and error correction techniques

will be described.

4.1 Bit Order

One deﬁnition for the bit order of the substructures can be seen in Figure 4-1. The

Figure 4-1: Order of the bits

in a bar.

number next to the substructure represents the index

of the bit (using little-endian format). Fore example, if

only substructure 19 is active, the bar will represent the

binary number 0b1000 0000 0000 0000 0000. If a second

bar would exist to the right, the ﬁrst bar’s value would

be shifted left 20 places. For example, if a second bar

would be added to the right of the bar from the previous

example and it would have all of its bits active, the new

value would be 0b1000 0000 0000 0000 0000 1111 1111

33

1111 1111 1111. This mechanism allows us to convert

barcodes into noisy bit streams with blocks of 20 bits. It is noisy because of possible

erroneous substructure classiﬁcations.

4.2 Error Sources

Since the barcode can be transformed into a noisy bit stream with blocks of 20 bits,

typical error detection and correction techniques can be used. However, we do have

some additional knowledge about the channel, that may help us make better decisions

about where the error may have originated and how to correct or detect it. Not only

using adequate coding, but also by implementing an appropriate usability ﬂow. We

can also calculate the needed error detection and correction capabilities by studying

the error sources.

There are 3 sources of errors: the ﬁrst being a possible human error while drawing

the barcode, the second a wrong detection of the barcode parts with the object

detection model (for example by missing a bar) and the third a wrong classiﬁcation

of a substructure.

To solve the problem of human error I would suggest adding a validation feature,

that is, a detection the user will try on it’s own to make sure the barcode was drawn

correctly whenever possible. However, take into account that in some situations, this

may not be possible (e.g. the user may not have a camera). Therefore, I would

recommend always adding error correction or detection capabilities based on the

application’s need for accuracy. Secton 4.3 will give more information on how to

construct such a mechanism.

The second possible source of error, the barcode detection model, could only be

trained with very few samples (144 barcodes) in this ﬁrst version. This led to an error

rate of about 2%, even when choosing a threshold of 0.5IoU (as seen in Subsection

3.1.4). This error rate is much higher compared to typical digital barcodes, which

reach accuracies of 1 error in 394 thousand, even in the worst case scenario of the

simplest barcode types (See Table 4.1). However, by choosing the right angle and

34

position of the camera, the rate can probably be improved. Therefore, I would suggest,

at least until more training data is available, to use a ﬁxed number of bars and only

detect the barcode when all needed pieces have been recognized.

Barcode Type Worst Case Accuracy Best Case Accuracy

Code 128 1 error in 2.8 million 1 error in 37 million

Code 39 1 error in 2.5 million 1 error in 34 million

UPC or more 1 error in 394 thousand 1 error in 800 thousand

Table 4.1: Error rates of popular 1D-barcodes (with 95% conﬁdence) [4].

The last source of error is a wrong classiﬁcation of a substructure. The substruc-

ture classiﬁcation model yielded a probability of correct classiﬁcation of 0.9723, which

means that the probability of error is 0.0277. This implies that the probability of at

least one error occurring in a bar is approximately 42.98%1. This number is way too

high to be in the range of useful barcodes. Therefore, using a coding mechanism to

correct and detect errors is crucial.

4.3 Error Detection and Correction

Improving angle and position of the camera and having a ﬁxed number of bars can

help reducing the errors originated by the object detection model. The human error

and the wrong bar substructure classiﬁcation can be overcome with selected minimal

accuracy by using Forward Error Detection techniques. Two of them that will be

explored are error correction through linear codes with minimal Hamming distance

and error correction by ﬂipping the least conﬁdence bits.

The basic idea of error correcting codes, is to reduce the number of accepted code

words, maximizing the distance between words. A standard distance between words

is the Hamming distance [43], which in the case of binary words is deﬁned as the

number of bits that have to be ﬂipped to get from one word to another. For example,

1The complimentary of the probability of no errors occurring in 20 bits = 1 −0.972320 ≈0.4298

35

you could get from word 001 to 100 by ﬂipping 2 bits (the ﬁrst and the last). Thus,

it has a Hamming distance of 2.

A property of a set of allowed words with a minimal Hamming Distance 𝑑𝐻(that

is, there aren’t 2 words in the set with a Hamming distance inferior to 𝑑𝐻) is that

𝑑𝐻−1errors can be detected and ⌊𝑑𝐻−1

2⌋errors corrected.

For example, if we only allow the recognition of the words 001 and 100, since the

minimal Hamming distance is 2, we will be able to detect if an error of 2−1=1bit

ﬂip happened and we would be able to ﬁx ⌊2−1

2⌋= 0 errors. Imagine that the word

101 would be recognized instead, we would then be able to detect an error since it isn’t

in the set of allowed words, but we could not be able to ﬁx it, since the probability of

the intended word being 001 and 100 would be the same, since the Hamming distance

to them would be the same. However, if we change the allowed words to 000 and 111,

the Hamming distance would increase to 3 and we would be able to detect 3−1=2

errors and even ﬁx ⌊3−1

2⌋= 1 error. If we now would detect the word 101, we would

know there was an error since the word is not part of our set of allowed words and

we could ﬁx it by transforming it to the word with the lowest Hamming distance to

the received word, i.e 111.

4.3.1 Error Detection and Correction using Linear Codes

If such a set of allowed words is generated by multiplication of the data bits with

a generative matrix, we say that we are using a linear code. Mathematically, lin-

ear codes can be constructed as a subspace of a vector space with any number of

elements. When generating linear codes for a binary system, usually Galois Fields

of 2 elements are used. These ﬁelds, usually written as GF(2), have the properties

of closure, commutativity, associativity, identity, inverse and distributivity [44] and

deﬁne the sum as the logical XOR operation and the multiplication as the logical

AND operation, which makes the implementation on hardware eﬃcient [45].

The allowed word set can be constructed by calculating:

⃗

𝑏𝑇=⃗𝑥 𝑇·¯

¯

𝐺(4.1)

36

where ¯

¯

𝐺is the generative matrix, ⃗𝑥 the data words and ⃗

𝑏the generated allowed

code words [45]. Note that ﬁnding an optimal ¯

¯

𝐺for a given code length and data

length is not always trivial, but there are published collections of best known matrices

for given lengths (such as [46]). Also, [47] shows the maximum Hamming distance

that can be achieved for given code and data lengths.

Table 4.2 shows recommendations for diﬀerent number of bars and expected ac-

curacy. The probability of wrong correction has been based on the probability mass

function and calculated as:

1−

⌊𝑑𝐻−1

2⌋

𝑥=0 𝑛

𝑥·𝑝𝑥·(1 −𝑝)𝑛−𝑥, 𝑝 = 0.0277 (4.2)

where 𝑑𝐻is the Hamming distance, 𝑛the amount of bits (20 ·number of bars)

and 𝑝the probability of incorrect classiﬁcation of a substructure. The probability of

an undetected error has been calculated with the formula:

1−

𝑑𝐻−1

𝑥=0 𝑛

𝑥·𝑝𝑥·(1 −𝑝)𝑛−𝑥, 𝑝 = 0.0277 (4.3)

Note that a set of words may have diﬀerent Hamming distances depending on

the region of the vector subspace where the decoded value was placed and thus have

diﬀerent probability of correct error ﬁxing or detection. Therefore, some of the items

of Table 4.2 show a range of Hamming distances and probabilities instead of single

values.

Obviously, increases of accuracy of correct decoding such as from the initial ap-

proximate 57% to 99.8% for one bar or 3.43% to 99.95% for 6 bars comes at a price.

Instead of using all bits for data transfer, we now have to use some of them as re-

dundancy bits to increase the Hamming distance and be able to correct or detect

errors. The rightmost column of Table 4.2 shows how many bits we have left for

data transfer. For example, if you need 6 bars for your application and want to use

error correction to achieve a maximal decoding error of 1 out of 2000 you would have

57 bits for data transfer (you would use the row with the Hamming distance 21-28,

since the wrong correction probability is at least 5.753 ·10−4). This would reduce the

37

objects you can distinguish from 2120 to 257.

Even if an error rate of 1 out of 2000 cannot compare to the error rates digital

barcodes can achieve, it still surpasses human data entry operators accuracy, which is

about 1 error each 300 keystrokes [48] (note that the value of a barcode would usually

need several keystrokes).

Number Hamming Wrong Correction Undetected Error Data

of Bars Distance Probability Probability Bits

1

51.703 ·10−21.784 ·10−411

61.703 ·10−21.252 ·10−510

71.998 ·10−37·10−79

2

7 - 8 2.436 ·10−21.044 ·10−4−1.205 ·10−524

82.436 ·10−21.205 ·10−523

9 - 10 4.786 ·10−31.204 ·10−6−1.051 ·10−720

3

9 - 11 2.53 ·10−2−6.297 ·10−33.952 ·10−5−7.202 ·10−724

10 - 12 2.53 ·10−2−6.297 ·10−35.642 ·10−6−8.276 ·10−823

11 - 12 6.297 ·10−37.201 ·10−7−8.276 ·10−820

4

11 - 15 2.38 ·10−2−1.707 ·10−31.324 ·10−5−5.236 ·10−947

13 - 18 6.821 ·10−3−3.775 ·10−43.134 ·10−7−6.318 ·10−12 42

16 - 19 1.71 ·10−3−7.453 ·10−56.004 ·10−10 −5.801 ·10−13 40

17 - 21 1.707 ·10−3−1.324 ·10−56.386 ·10−11 −3.331 ·10−16 36

517 - 22 1.892 ·10−3−1.079 ·10−42.471 ·10−9−4.741 ·10−14 51

19 - 24 4.753 ·10−4−2.232 ·10−53.921 ·10−11 −4.299 ·10−16 49

619 - 25 1.939 ·10−3−3.116 ·10−51.018 ·10−9−3.725 ·10−15 64

21 - 28 5.348 ·10−4−6.653 ·10−61.951 ·10−11 −3.585 ·10−18 57

721 - 30 1.894 ·10−3−9.062 ·10−63.862 ·10−10 −3.041 ·10−18 72

23 - 32 5.613 ·10−4−1.981 ·10−68.543 ·10−12 −2.958 ·10−20 69

824 - 36 1.792 ·10−3−5.903 ·10−72.239 ·10−11 −2.426 ·10−22 80

25 - 38 5.623 ·10−4−1.239 ·10−73.436 ·10−12 −2.122 ·10−24 75

Table 4.2: Linear code choice recommendations for diﬀerent number of bars.

The standard form of the generative matrix is:

¯

¯

𝐺= [𝐼𝑘|¯

¯

𝑃](4.4)

where 𝐼𝑘is the identity matrix of size 𝑘×𝑘and ¯

¯

𝑃is of size 𝑘×(𝑛−𝑘), with 𝑘

being the length of a data word and 𝑛the length of an encoded word. Equation 4.4

can be used to calculate a so called parity check matrix ¯

¯

𝐻, which fulﬁls

38

⃗

𝑏𝑇·¯

¯

𝐻𝑇= 0 (4.5)

for all ⃗

𝑏𝑇produced with Equation 4.1 and is diﬀerent to 0 for all other words.

This also implies that ¯

¯

𝐺·¯

¯

𝐻𝑇= 0, which means that ¯

¯

𝐻must be of the form

¯

¯

𝐻= ( ¯

¯

𝑃𝑇|¯

¯

𝐼𝑛−𝑘)(4.6)

Note that the formula actually should be ¯

¯

𝐻= (−¯

¯

𝑃𝑇|¯

¯

𝐼𝑛−𝑘), but thanks to the

logical XOR properties, the minus sign can be ignored. Once we have ¯

¯

𝐻, a recognized

word ⃗

𝑏*can be tested for errors with:

⃗𝑠 𝑇=⃗

𝑏*𝑇·¯

¯

𝐻𝑇=⃗

𝑏*𝑇·(¯

¯

𝑃

¯

¯

𝐼𝑛−𝑘

) = [(𝑏*

1, 𝑏*

2, ..., 𝑏*

𝑘)·¯

¯

𝑃⊕(𝑏*

𝑘+1, 𝑏*

𝑘+2, ..., 𝑏*

𝑛)·¯

¯

𝐼𝑛−𝑘](4.7)

The vector ⃗𝑠 is called a syndrome and is zero if a correct word was recognized

(due to Equation 4.5). Vector ⃗

𝑏*is composed of the word that should be recognized

⃗

𝑏summing an error vector ⃗𝑒 (which is ⃗

0if the recognition was successful):

⃗

𝑏*=⃗

𝑏+⃗𝑒 (4.8)

Putting everything together, we get that

⃗𝑠 𝑇= (⃗

𝑏𝑇+⃗𝑒 𝑇)·¯

¯

𝐻𝑇=⃗

𝑏𝑇·¯

¯

𝐻𝑇+⃗𝑒 𝑇·¯

¯

𝐻𝑇=⃗

𝑏𝑇·¯

¯

𝐻𝑇(4.9)

which can be used to ﬁnd the source of error and correct it [45, 49]. One simple

method would be to create a table with all possible errors and their syndromes and

check it when an erroneous word has been received. Using Equation 4.9 and knowing

the error that generates such a syndrom, all erroneous bits on the recognized word

can just be ﬂipped to get the intended word.

39

4.3.2 Evaluation of Error Correction by Exploiting Model

Conﬁdence

In some cases, it might be suitable to use the model’s conﬁdence to try to correct

errors. The idea would be to use an error detection technique, for example a linear

code to detect the amount of errors that have occurred and then ﬂip that many

bits, selecting the ones where the substructure classiﬁcation model has the lowest

conﬁdence.

This technique could drastically increase the amount of data bits that can be

used. Note that a Hamming distance of 𝑛+ 1 would be needed to ﬁx 𝑛errors instead

of 2·𝑛+ 1. However, using the validation dataset, I have found that if one error

occurs, the probability of ﬁxing it correctly is only 61%. If a second error were

to occur, the probability of ﬁxing them would decrease even further, reaching an

approximate probability of successful error correction of 5.5%. Therefore, with the

current substructure classiﬁcation model, this technique is not advised.

40

Chapter 5

Future Lines of Research

In this thesis a novel hand-drawn barcode has been presented. To develop the pro-

posed structure, the Omniglot dataset was explored using the simplicity by speed

axiom. The outcome of the exploration together with numerical and perceptual rea-

sons led to create a barcode subdivided into bars of 20 substructures each. Further

research or user studies about human drawing capabilities could help to create an

improved hand-drawn barcodes structure with higher data density, for example by

creating alternative substructures, thus changing the base of the represented num-

ber. Another possibility would be to build a model that predicts the complexity of a

barcode and use it to reﬁne the current barcode structure.

In Chapter 3, a procedure to detect and classify the barcode’s substructures has

been demonstrated and evaluated. The training of the object detection and classiﬁ-

cation models has been limited to a relatively small number of samples. Once more

training samples have been collected, an improvement of the precision of the models

is to be expected. This would lead to a reduction of the necessary redundant bits

which have been recommended in Chapter 4.

The presented hand-drawn barcode, reading mechanism and coding techniques

should not be seen as a ﬁnal version, but as a ﬁrst step with great improvement

potential.

41

42

Appendix A

Additional Information

A repository that contains images, source code and conﬁguration ﬁles related to this

thesis can be found at:

https://github.com/dkk/Hand-Drawn-Barcode.

43

44

Bibliography

[1] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards

real-time object detection with region proposal networks, 2015.

[2] Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang, and

Qi Tian. Centernet: Keypoint triplets for object detection, 2019.

[3] Mingxing Tan, Ruoming Pang, and Quoc V. Le. Eﬃcientdet: Scalable and

eﬃcient object detection, 2019.

[4] Fritz J. and Dolores H. Russ. Executive summary: Code 16k and code

49 data integrity test. https://www.idautomation.com/Assets/pdf-links/

OSU-Data-Integrity-Linear.pdf (Accessed 2020-09-26).

[5] Hongyu Wang and Guangcun Shan. Recognizing handwritten mathematical

expressions as latex sequences using a multiscale robust neural network, 2020.

[6] Adam Byerly, Tatiana Kalganova, and Ian Dear. A branching and merging

convolutional network with homogeneous ﬁlter capsules, 2020.

[7] Abdul Mueed Haﬁz and Ghulam Mohiuddin Bhat. Reinforcement learning based

handwritten digit recognition with two-state q-learning, 2020.

[8] Peng Xu, Yongye Huang, Tongtong Yuan, Tao Xiang, Timothy M. Hospedales,

Yi-Zhe Song, and Liang Wang. On learning semantic representations for million-

scale free-hand sketches, 2020.

[9] Dat Quoc Nguyen, Thanh Vu, and Anh Tuan Nguyen. Bertweet: A pre-trained

language model for english tweets, 2020.

[10] Ikuro Sato, Hiroki Nishimura, and Kensuke Yokoi. Apac: Augmented pattern

classiﬁcation with neural networks, 2015.

[11] Alireza Rezvanifar, Melissa Cote, and Alexandra Branzan Albu. Symbol spotting

on digital architectural ﬂoor plans using a deep learning-based framework, 2020.

[12] William Adorno III, Angela Yi, Marcel Durieux, and Donald Brown. Hand-drawn

symbol recognition of surgical ﬂowsheet graphs with deep image segmentation,

2020.

45

[13] Brenden M. Lake, Ruslan Salakhutdinov, and Joshua B. Tenenbaum. Human-

level concept learning through probabilistic program induction. Science,

350(6266):1332–1338, 2015.

[14] Liwei Wang, Yan Zhang, and Jufu Feng. On the euclidean distance of images.

IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8):1334–

1339, 2005.

[15] Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality

assessment: from error visibility to structural similarity. IEEE Transactions on

Image Processing, 13(4):600–612, 2004.

[16] M. . Dubuisson and A. K. Jain. A modiﬁed hausdorﬀ distance for object match-

ing. In Proceedings of 12th International Conference on Pattern Recognition,

volume 1, pages 566–568 vol.1, 1994.

[17] C. E. Shannon. A mathematical theory of communication. The Bell System

Technical Journal, 27(3):379–423, 1948.

[18] Mohammed Aljanabi, Zahir Hussain, and Songfeng Lu. An entropy-histogram

approach for image similarity and face recognition. Mathematical Problems in

Engineering, 2018:1–18, 07 2018.

[19] Brenden M. Lake, Ruslan Salakhutdinov, and Joshua B. Tenenbaum. Omniglot

data set for one-shot learning. https://github.com/brendenlake/omniglot

(Accessed 2020-07-28).

[20] Veijo Virsu. Tendencies to eye movement, and misperception of curvature, di-

rection, and length. Perception and Psychophysics, 9:65–72, 01 1971.

[21] Barcode Island. Code 128 symbology. http://www.barcodeisland.com/

code128.phtml (Accessed 2020-08-02).

[22] Deia Ganayim. Visual processing of connected and unconnected letters and words

in arabic. Cognitive Linguistic Studies, 2:205–238, 01 2015.

[23] Daniel Klöck. Hand-drawn barcode user study. https://

barcode-dataset-generator.herokuapp.com/ (Accessed 2020-09-18).

[24] Python.org. https://www.python.org/ (Accessed 2020-09-18).

[25] Flask: A lightweight wsgi web application framework. https://

palletsprojects.com/p/flask/ (Accessed 2020-09-18).

[26] Skeleton: Responsive css boilerplate. http://getskeleton.com/ (Accessed

2020-09-18).

[27] Cloudinary: Image and video upload, storage, optimization and cdn. https:

//cloudinary.com/ (Accessed 2020-09-18).

46

[28] Heroku: Cloud application platform. https://www.heroku.com/ (Accessed

2020-09-18).

[29] Rectlabel for object detection. https://rectlabel.com (Accessed 2020-09-18).

[30] Tensorﬂow 2 detection model zoo. https://github.com/tensorflow/models/

blob/master/research/object_detection/g3doc/tf2_detection_zoo.md

(Accessed 2020-09-18).

[31] Tsung-Yi Lin, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B.

Girshick, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll’a r, and

C. Lawrence Zitnick. Microsoft COCO: common objects in context. CoRR,

abs/1405.0312, 2014.

[32] Ross Girshick. Fast r-cnn, 2015.

[33] Ross Girshick, Jeﬀ Donahue, Trevor Darrell, and Jitendra Malik. Rich feature

hierarchies for accurate object detection and semantic segmentation, 2013.

[34] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learn-

ing for image recognition, 2015.

[35] Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional

networks, 2013.

[36] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for

large-scale image recognition, 2014.

[37] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet:

A large-scale hierarchical image database. In 2009 IEEE conference on computer

vision and pattern recognition, pages 248–255. Ieee, 2009.

[38] Faster r-cnn: Down the rabbit hole of modern ob-

ject detection. https://tryolabs.com/blog/2018/01/18/

faster-r-cnn-down-the-rabbit-hole-of-modern-object-detection/

(Accessed 2020-09-26).

[39] Xingyi Zhou, Dequan Wang, and Philipp Krähenbühl. Objects as points, 2019.

[40] Mingxing Tan and Quoc V. Le. Eﬃcientnet: Rethinking model scaling for con-

volutional neural networks, 2019.

[41] A thorough breakdown of eﬃcientdet for ob-

ject detection. https://towardsdatascience.com/

a-thorough-breakdown-of-efficientdet-for-object-detection-dc6a15788b73

(Accessed 2020-09-26).

[42] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimiza-

tion, 2014.

47

[43] R. W. Hamming. Error detecting and error correcting codes. The Bel l System

Technical Journal, 29(2):147–160, 1950.

[44] Suayb S. Arslan. Finite ﬁelds and linear codes. http://www.suaybarslan.com/

classnotes2.pdf (Accessed 2020-10-12).

[45] Prof. Dr.-Ing. Gerald Oberschmidt. Grundlagen der Übertragungstechnik,

kapitel 5: Datensicherung und kodierung. http://dualplus.de/ueb_19/

uebertragung.pdf (Accessed 2020-10-12).

[46] Markus Grassl. Searching for linear codes with large minimum distance. In Wieb

Bosma and John Cannon, editors, Discovering Mathematics with Magma — Re-

ducing the Abstract to the Concrete, volume 19 of Algorithms and Computation

in Mathematics, pages 287–313. Springer, Heidelberg, 2006.

[47] Markus Grassl. Bounds on the minimum distance of linear codes and quantum

codes. Online available at http://www.codetables.de, 2007. Accessed on 2020-

10-11.

[48] Barcode reading and accuracy. https://www.labce.com/spg650115_barcode_

reading_and_accuracy. (Accessed 2020-10-12).

[49] Richard E. Blahut. Linear Block Codes, page 49–66. Cambridge University Press,

2003.

48