Content uploaded by Tanner Bohn
Author content
All content in this area was uploaded by Tanner Bohn on May 13, 2020
Content may be subject to copyright.
A Deeper Look at Bongard Problems
Xinyu Yun, Tanner Bohn, and Charles Ling
Western University, London ON N6A 3K7, Canada
{xyun,tbohn,charles.ling}@uwo.ca
Abstract. Machine learning, especially deep learning, has been success-
fully applied to a wide array of computer vision classification tasks in
recent years. Infamous for requiring massive amounts of data to perform
well at image classification problems, deep learning has so far been un-
able to solve Bongard problems (BPs), a set of abstract visual reasoning
tasks invented in the 1960s. Each BP can be seen as a supervised learning
task, with few training samples (6 for positive and 6 for negative), and
often requiring highly abstract features to learn well. Automatically solv-
ing Bongard problems directly from images remains an ambitious goal,
with very little machine learning literature on the topic. In this paper,
we discuss several special properties of BPs as well as what it means to
solve a BP. Making use of an expanded set of BP-like tasks to allow for a
more careful evaluation of automated solvers, we develop and benchmark
a deep learning based approach to solve these problems. To encourage
work on this interesting problem, we also make freely available a dataset
of over 200 BPs 1.
Keywords: Bongard Problems ·Convolutional Neural Networks ·Fea-
ture Extraction ·Few-Shot Learning.
1 Introduction
Despite recent successes in machine learning on many problems previously con-
sidered beyond the reach of artificial intelligence, tasks requiring divergent think-
ing, abstraction, and few-shot learning continue to be a challenge. While other
tasks requiring one or more of these properties have seen recent attention and
progress [12, 9], Bongard problems (BP), which appear to require the solver to
possess all three of these skills, continue to be largely unstudied. Created in the
1960s by Mikhail Bongard, these problems were designed to demonstrate the
inadequacy of the standard pattern recognition tools of the day for achieving
human-level visual cognition [1].
A typical Bongard problem consists of 12 tiles evenly divided into a left and a
right class. To gauge the cognitive abilities of a test subject, the subject is shown
the 12 tiles and then asked to provide a rule which distinguishes the tiles appear-
ing on one side from the tiles on the other. For example, the intended rule for
1https://github.com/XinyuYun/bongard-problems
2 Xinyu Yun, Tanner Bohn, and Charles Ling
Fig. 1. Examples of easy, intermediate, and difficult Bongard problems.
the second problem in Figure 1 is ‘clockwise spirals on the left, counterclockwise
spirals on the right’.
As a classification task, BPs possess several properties which make it both
interesting and difficult with respect to machine learning. A few of these prop-
erties are shared with other well-studied tasks, however, other properties also
establish the Bongard problems as uniquely difficult2.
Divergent thinking. The three Bongard problems in Figure 1, ranging from
easy to difficult demonstrate the typical variation, both visually and in terms of
solutions. Since there are a very large number of potential features to consider
and many ways these features can be combined to define different rules, diver-
gent thinking is required to perform well at Bongard problems. This property
is also partially shared by Raven’s Progressive Matrices (RPM) task [11], where
deciding upon the tile which best completes the matrix requires considering
many alternative hypotheses to find the one requiring the simplest justification.
using a fixed set of visual features and sequence progression relations. There is
considerably more diversity in the visual elements and rule types in Bongard
problems.
Abstract thinking. To solve second problem in Figure 1, recognizing that
the shapes have the characteristic of spiraling requires abstract thinking, be-
cause the property of spiraling is not physically present, but exists as non-trivial
relationship between points on a curve. The patterns required to be identified
to solve the problem often are not directly visible, but exists as a complex rela-
tionship between other abstract features. For example, finding the intended rule
for the third problem in Figure 1 likely requires observing that the individual
shapes of a particular type should be grouped together to form the outlines of
larger shapes.
Few-shot learning. To recognize that all objects on a given side share
one potentially complex property among innumerable alternatives given only six
samples per class requires few-shot learning. In contrast, datasets for image clas-
sification problems often have orders of magnitude more samples per class. This
few-shot learning property is shared with both the popular Omniglot task (con-
2A description of what does and does not make for a valid BP can be found here:
http://www.foundalis.com/res/invalBP.html.
A Deeper Look at Bongard Problems 3
cerned with classification of hand-written characters) [9] and Raven’s Progressive
Matrices (matrix completion) [11].
For most of these properties, machine learning has had some success on asso-
ciated problems. However, when multiple properties are present, as in the case
of BPs, learning to automatically solve the tasks becomes much more difficult.
Due to this large performance gap and the unique challenges of BPs, we believe
studying BPs is an efficient route towards reaching human-level performance
across a variety of tasks.
Towards this end, the main contributions of the present work are as follows.
–We adapt a deep learning based approach to solve Bongard problems and
overcome weaknesses in previous approaches (Section 3).
–We consider the set of properties which make BPs uniquely difficult and pro-
pose a set of metrics for automatic evaluation of BP solvers, which interprets
BPs as few-shot classification tasks (Section 4).
–We evaluate our deep-learning based approaches on the BPs while examining
the effects of pre-training and feature extraction methods (Section 5).
2 Related Work
Due to the difficulty of automatically solving BPs or the lack of awareness of
them, few attempts at the task have been made.
Motivated by the appearance of Bongard problems in Godel, Escher, Bach
[6], Hofstadter’s own graduate student, Harry Foundalis, decided to approach
the problem of automatically solving them in his dissertation [5]. Foundalis’
approach consists of a cognitive architecture for visual pattern recognition called
Phaeaco, which tries to solve BPs with the following process. First, working at
the pixel level, Phaeaco attempts to explicitly extract the geometric primitives
contained in each of the 12 tiles of a problem. Next, features shared among the
tiles for each side are identified. This is repeated either until a rule is found
or some stopping criterion is reached. The Phaeaco model is capable of finding
solutions for up to 15 problems out of 2003. Due to the non-deterministic nature
of the program, the success rate of each of these problems varies dramatically,
between 6% and 100%.
A more recent approach of solving Bongard problems is provided by [3]. Sim-
ilar to Phaeaco, their pipeline begins with explicit extraction of visual features.
Additionally, these visual features are then translated into a symbolic visual vo-
cabulary. Candidate rules which split the 12 tiles are scored based on assigned
prior probabilities of the grammar’s production rules which produced the rule in
such a way that shorter, less complex rules are preferred. Under this restriction,
only 39 BPs are considered. The approach solves 35 of the 39 problems.
A recent approach utilizing deep learning to solve BPs was proposed in an
intriguing blog post by [7]4. While this approach does not entirely avoid manu-
ally defining the type of visual features that are important to consider, it comes
3Phaeaco results can be found here: http://www.foundalis.com/res/solvprog.htm.
4https://k10v.github.io/2018/02/25/Solving-Bongard-problems-with-deep-learning/
4 Xinyu Yun, Tanner Bohn, and Charles Ling
close, and is the inspiration for the model we present in Section 3. Kharagorgiev’s
approach works roughly as follows: first, an image dataset of simple shapes is au-
tomatically constructed and used to train a convolutional neural network (CNN)
as domain knowledge. Second, feature vectors are extracted and binarized with
a manually chosen threshold for each of the 12 tiles with the CNN by taking
globally-averaged feature maps, as proposed in [10]. Finally, finding a solution
to a BP is then reduced to locating a feature where all tiles from each side
have the same value, unique to that side. Of the 232 problems assembled by
Foundalis5, 47 problems are considered solved, 41 of which are correctly solved.
3 Our Models
Due to the uniqueness (both visually and in terms of solutions) of Bongard
problems and the small size of the problem set compiled over the years (currently
around 300), training a meta-learning model on a subset of the problems to
try apply to new problems is difficult without overfitting to the specific rules
types present in the training data. These properties make recent state-of-the-
art approaches for few-shot classification problems [14], ill-suited for Bongard
problems. In an attempt to overcome these hurdles, we apply transfer learning, a
common deep learning based approach to learning with small data. The approach
we take is to pre-train a convolutional neural network with synthetic images
that contain visual features commonly present in BP tiles, then train a simple
classifier on feature vectors for the 12 tiles produced by the CNN feature maps.
Figure 2 provides a high level view of this process.
Pre-training Samples
Fig. 2. Bongard problem solver pipeline.
Pre-training. Pre-training for image classification, as described in [4], pop-
ularized the insight that rather than learning to perform a new classification
5The set of original BPs by Mikhail Bongard as well as those proposed by others can
be found here: http://www.foundalis.com/res/bps/bpidx.htm.
A Deeper Look at Bongard Problems 5
task from scratch, one can take advantage of knowledge coming from previously
learned categories. By training a machine learning model to perform one task,
it may implicitly discover features useful to learning to perform another similar
task. Compared to past approaches to solving BPs which extracted visual and
abstract features using hard-coded feature detectors and routines [3, 5], we can
influence what patterns a CNN discovers by simply augmenting the training task
to require discovering those patterns, a much easier task than manually writing
algorithms to detect those particular features. To ensure that the features we
extract from the feature maps that are relevant to visual patterns presenting in
BPs, and we pre-train the CNN on a related task: shape classification. Figure 2
shows some pre-training samples as well. In Section 5.2, we examine the effects
of increasing variety of shapes on final BP solver performance.
Feature Extraction. To extract features for a given BP tile, we use global-
averaged feature map activation, which computes the spatially averaged acti-
vation value for each kernel [10]. The magnitude of a globally-averaged value
can be interpreted as the prevalence of a particular feature in the input image,
with features in earlier layers often corresponding to simple visual features and
later layers detecting features corresponding to more abstract concepts specific
to the dataset and task [15]. In Section 5.2, we examine the effects of extracting
features from layers of different depths in the pre-trained CNNs.
Classification. After calculating feature vectors for each tile in a BP, we
train a classifier to distinguish between the two classes. While any classifier may
be used, careful consideration should be made to influence the type of rule we
want it to learn. In Section 4, we discuss the different types of solutions and rules
to Bongard problems. In Section 5.2, we also observe the effects of the classifier
on performance.
4 Evaluating Bongard Problem Solvers
To understand how to automatically evaluate a BP solver, it helps to under-
stand what properties a solution may possess. In the present work, we consider
proposed solutions to possess (or lack) the following properties.
Validity. We consider a proposed rule to be valid if it is able to correctly
split (classify) the original 12 tiles, and invalid otherwise. We consider a rule
to be a condition that can categorize tiles into left or right (either correctly or
incorrectly), whereas a solution is a rule which is valid and can thus correctly
categorize the 12 original tiles.
Robustness. We consider a solution to be robust if it is able to not only
classify the original 12 tiles, but also additional ones which are classified left or
right according to the intended rule, defined by the author of the problem.
Simplicity. An intuitive definition, although often impractical to use for
evaluation, is that a simple solution takes few words to state. The opposite of a
simple rule is a complex rule.
Figure 3 illustrates how valid rules (solutions) to a given problem may vary
in robustness and simplicity. In Section 4.1 we discuss how to evaluate a solver
6 Xinyu Yun, Tanner Bohn, and Charles Ling
Fig. 3. Bongard #5 and various valid solutions (assuming each tile is 100px by 100px).
with respect to validity, and in Sections 4.2 and 4.3, we discuss evaluation with
respect to robustness and simplicity.
4.1 Measuring Validity
A model is said to produce a valid solution for a BP if the proposed rule correctly
splits the 12 tiles into two groups. This corresponds to the evaluation method
used by [5] and [3] (without the next step of subjective analysis). To condense
the validity performance of a model into a single value, we average the validity
scores across a set of Bongard problems:
validity =1
#BPs X
p∈BP s
pCC (1)
where pCC is the set of all tiles in pcorrectly classified.
To accompany the validity score, we consider the average problem number
where a valid solution is found. This allows us to observe whether our models
have a bias, similar to humans, of solving more easy than difficult problems. This
works due to the trend of problem difficulty increasing with problem number in
the set of 200 BPs compiled by [5].
4.2 Measuring Robustness
Since the only way a solution can be robust is with respect to the intended
solution, we use a functional definition of robustness. If a rule is able to correctly
classify unseen samples from each class then it can be considered robust.
Here we define a subset BP s(v)to represent Bongard Problems with valid
solutions found by our model, and the set #BP (v)contains the total numbers
A Deeper Look at Bongard Problems 7
of BPs for which valid rules are found. We average the robustness score based
on BPs(v):
robustness =1
#BP s(v)X
p∈BP s(v)
newT ilesC C (2)
where newT ilesC C if the fraction of new tiles for pcorrectly classified.
As noted by [3], Bongard problems are unlike usual classification problems
in that the small number of examples for each class are often carefully chosen
to have a single property in common while ruling out as many alternatives as
possible. Leaving out even one or two tiles opens up to possibility for finding
many non-intended solutions. Additionally, this interpretation of robustness ig-
nores the case where a rule acts unexpectedly when presented with tiles that
do not clearly belong to either side. If left vs right is circles vs. squares, what
does it mean if a picture of a lamp is classified left? We therefore only consider
robustness under the assumption that all tiles presented will belong to either the
left or right.
4.3 Measuring Simplicity
Measuring the simplicity of a tile classification rule learned by an automated
solver may be extremely difficult. This problem of interpreting how a deep learn-
ing model works is well studied with regards to image classification and often
done with saliency maps, which show the parts of the input image which most
influence the classification results [15, 13]. In the present work, we do not at-
tempt to define a rule simplicity measure, however, in Section 5.3, we consider
visualizing activation maps to gain insight into the types of rules discovered by
out models.
5 Experiments and Results
In this section we analyze the performance and effects of hyperparameters of
two variations of our problem solving model. The first model, PT+SF, uses
pre-training and single feature classification (a decision tree of depth 1). Second
is PT+LR, which also utilizes pre-training, but can propose rules combining
many features using a logistic regression classifier.
First we discuss the experimental setup in Section 5.1, then we discuss ob-
servations made in 5.2, and in Section 5.3 we produce visualizations of the rules
implicitly learned by a solver and examine their utility.
5.1 Setup
To observe the effect of feature abstraction on BP solver performance with as
few confounding variables as possible, we use the same hyperparameters for each
of the convolutional layers (architecture shown in Figure 4):
8 Xinyu Yun, Tanner Bohn, and Charles Ling
3x3 conv, 64,
ReLU
Input tile
96x96x1
2x2 Max pool
3x3 conv, 64,
ReLU
2x2 Max pool
3x3 conv, 64,
ReLU
2x2 Max pool
3x3 conv, 64,
ReLU
2x2 Max pool
Flatten
Dense + Softmax
conv_0 conv_1 conv_2 conv_3
output
Fig. 4. Neural network architecture used. Four convolutional layers with the same
hyperparameters are used.
–64 kernels of size 3x3 with stride 1, ReLU non-linearity, and followed by 2x2
max-pooling with stride 2.
–For the PT+SF and PT+LR models, the output of the last convolutional
layer is mapped to shape class probabilities with a dense layer and softmax
activation
The models are trained with categorical cross-entropy loss and the Adam
optimizer [8] with the default hyperparameters defined by Keras [2].
We use 100,000 tiles (80/20 train/validation split) of size 96x96x1. Table 1
contains the details of the five different pre-training data types of increasing
complexity we experiment with and the number of training epochs we found
to produce stable validation scores without overfitting. The final validation ac-
curacy ranged from 100% for the easiest pre-training set to 93% for the most
complex set.
Table 1. Pre-training data details.
Pre-training data type Shape classes # Shape classes Training epochs
1 Single-segmented lines, dots, curves 3 3
2 #1, circles, ellipses 7 6
3 #2, 3-gons, equilateral 3-gons 11 20
4 #3, 2- and 3-segmented lines, 4-gons, equilateral 4-gons 17 20
5 #4, 5- and 6-gons, equilatereral 5- and 6-gons 25 20
To evaluate the overall validity power of our models, we incrementally com-
bine and keep all useful features from each convolutional layer, including the
output with small size of shape classes that may carrying simple shape infor-
mation to solve certain BPs, to obtain a consistent evaluation results. Sample
layers would be like:
output, output +CL3, output +C L3 + C L2, ....
A Deeper Look at Bongard Problems 9
5.2 Results and Observations
At first, we evaluate the validity scores and average values of BP#(v) as defined
in Section 4.1 for the 200 BPs. Then we manually created two additional test
tiles for each problem (1 for each side) in order to estimate robustness based
on the model’s validity. All experiments results for PT+SF and PT+LR are
averaged across 5 trials.
In Table 2 and Table 3 we see the validity and robustness for the proposed
PT+SF and PT+LR models, containing the results with respect to the pre-
training type and combined layers for feature extraction.
Table 2. Effects of pre-training and CNN combined layers used for feature extraction
on PT+SF performance with 5 trials. CLi refers to the ith convolutional layer. The
highest scores for each metric are bolded, and second and third highest underlined.
Pre-training type
Metric Layers 1 2 3 4 5
output 1.2% (89) 3.3% (50) 3% (21) 2.6% (49) 1.9% (66)
output+CL3 14.3% (78) 18.6% (84) 19.5% (79) 22.1% (85) 19.2% (84)
Validity (avg BP#(v)) output+CL3+CL2 18.8% (83) 22.1% (86) 25% (84) 27% (90) 24.7% (90)
output+CL3+CL2+CL1 21.5% (87) 25.1% (87) 26.4% (85) 28.6% (90) 28.0% (91)
output+CL3+CL2+CL1+CL0 23.3% (90) 26.4% (88) 27.7% (88) 30.2% (93) 28.6% (91)
output 97.50% 81.90% 64.94% 66.16% 84.16%
output+CL3 61.18% 60.58% 62.92% 64.74% 66.30%
Robustness output+CL3+CL2 62.50% 62.36% 61.84% 62.66% 65.64%
output+CL3+CL2+CL1 62.30% 63.30% 62.50% 64.58% 67.52%
output+CL3+CL2+CL1+CL0 63.86% 60.84% 63.36% 63.34% 66.26%
Table 3. Effects of pre-training type and CNN layers used for feature extraction on
PT+LR performance. For the results shown, the logistic regression penalty is fixed to
l2and inverse regularization strength is chosen from C= [1,2,4,8,16,32,64,128]. The
highest scores for each metric are bolded, and second and third highest underlined.
Pre-training type
Metric Layers 1 2 3 4 5
output 1% (78) 2.7% (67) 1.8% (49) 2.5% (22) 3.7% (43)
output+CL3 71.9% (96) 94.7% (96) 96.4% (97) 98.1% (98) 99.7% (99)
Validity (avg BP#(v)) output+CL3+CL2 78.8% (99) 95.9% (97) 97.9% (98) 98.6% (98) 99.8% (99)
output+CL3+CL2+CL1 79.2% (100) 95.9% (97) 98.1% (98) 98.7% (98) 99.8% (99)
output+CL3+CL2+CL1+CL0 78% (100) 95.9% (97) 98.1% (98) 98.7% (98) 99.8% (99)
output 100.00% 84.28% 58.00% 74.32% 60.56%
output+CL3 57.88% 54.64% 56.74% 54.74% 55.76%
Robustness output+CL3+CL2 56.08% 54.52% 56.68% 55.08% 55.92%
output+CL3+CL2+CL1 56.40% 54.34% 56.86% 55.42% 56.72%
output+CL3+CL2+CL1+CL0 57.00% 54.34% 56.86% 55.22% 56.92%
Effects of pre-training complexity. For both PT+SF and PT+LR, in-
creasing the variation of the shape set led to an improvement in both validity
and robustness, with the effect being stronger when using the logistic regression
10 Xinyu Yun, Tanner Bohn, and Charles Ling
classifier. The only exception is when the output class distributions are the only
extracted features, in while case the robustness scores may be unusually high
due to the smaller value of BP s(v)
Effects of layers combination. For both pre-trained models, it appears
that including more convolutional layers produces better features when mea-
suring validity, but robustness is less affected. This may be due to the deeper
convolutional layers learning features specific to the shape classification task and
thus less applicable to other tasks [15]. We can also observe that the PT+LR
model can be seen as over-fitting when measuring validity (as indicated by the
low corresponding robustness).
The output layer consistently performs poorly for both PT+SF and PT+LR
in terms of validity, likely due to the small number of shape classes as listed in
Table 1. This also suggests that just knowing what basic shapes are present in
the image is helpful for solving only a small set of simple Bongard problems.
Effects of classifier. In the PT+SF model, we used a decision tree with
depth 1 to choose a single visual feature to serve as a rule for each Bongard
problem. From the results, this very simple classifier has has generally smaller
validity scores compared with the PT+LR model, but is more robust. This ob-
servation matches the nature of BPs: they are often designed to be solved with
only one abstract rule or feature as an intended solution. Thus, PT+SF may
score higher in simplicity than PT+LR. The PT+LR model, using logistic re-
gression, linearly combines many features. Not surprisingly, this more expressive
classifier is capable of producing much higher validity.
Overall performance. While direct performance comparisons should not
be drawn to previous approaches due to differences in the types of rules automat-
ically produced, our approaches are capable of finding valid solutions to a greater
fraction of problems than previous approaches. Our PT+SF model finds valid
solutions for up to 30.2% of the problems (∼60/200) and correctly classifies two
new test tiles for 66.3% of the problems(∼38/60). The PT+LR achieve almost
100% validity, but at the cost of more complex solution rules. In contrast, [5]
reports ∼7.5% and the previous work most similar to ours and without further
test set validation, [7], reported 18% of 232 problems solved (and 19% of the 200
problems we use)).
5.3 Rule Visualization
In Figure 5 we present 8 problems for which a valid solution was found by a
PT+SF model which used pre-training set #4 and tile embeddings from the
feature maps in the last convolutional layer. Highlighted areas indicate higher
values in the activation map chosen by the BP solver for that problem. The
intended rules are provided for each problem.
Problems (a) to (d) in Figure 5 have arguably interpretable rules. The in-
tended rule for (a) is ’small shapes present on the left but not the right’, and
as expected, small figures are highlighted by the activation map of the auto-
matically chosen filter. In both (b) and (c), the shapes clearly associated with
A Deeper Look at Bongard Problems 11
(a) BP #21 - small figure
present vs. no small figure
present
(b) BP #25 - filled figure is
triangle vs. fill figure is circle
(c) BP #94 - filled circle not at
endpoint vs. filled circle at
endpoint
(d) BP #183 - same curvature
close to the middle vs. change
of curvature close to middle
(e) BP #8 - on the right side vs.
on the left side
(f) BP #17 - angle directed
inwards vs. no inward angle
(g) BP #101 - parallel dents vs.
perpendicular dents
(h) BP #164 - number of objects
is one less than sides vs. number
of objects is more than sides
Fig. 5. Examples of interpretable (a to d) and non-interpretable (e to h) visualizations
of valid rules found by the PT+SF model for Bongard problems.
the intended rules are highlighted. However, for (d), it appears that a valid, al-
though non-intended solution was identified: there is more empty space around
the corners on the right than on the left. The intended solution for this problem
is ’same curvature close to the middle vs. change of curvature close to mid-
dle’. Problems (e) to (h) have also had valid solutions identified, but serve to
demonstrate that a standard method of visualizing what a CNN has learned is
frequently not well-suited for Bongard problems, as it is not always clear what
part of the tiles should be highlighted to make the discovered rule more visible.
6 Conclusions
Bongard problems are a kind of visual puzzle which require skills central to
human intelligence: divergent thinking, abstract thinking, and the ability to learn
from little data. To solve these problems given raw images, we train a CNN to
perform the related task of shape classification and use the globally-averaged
feature maps to produce feature vectors for the tiles of a BP. We observed that
increasing the shape variation of the pre-training data as well as extracting
features from deeper convolutional layers tended to improve the quality of the
extracted feature vectors, increasing the number of problems for which valid and
robust solutions could be discovered.
The present work hints at many promising avenues. While the author of a
problem may have a particular rule in mind, an automated solver may identify
many valid solutions. Adding an active learning component to the Bongard prob-
lem requiring automated solvers to strategically test highly abstract hypotheses
may be interesting. Developing a visualization technique capable of conveying
12 Xinyu Yun, Tanner Bohn, and Charles Ling
the abstract rules learned by an automated solver is another task which may
prove to be important.
References
1. Bongard, M.M.: The problem of recognition. Fizmatgiz, Moscow (1967)
2. Chollet, F.: keras. https://github.com/fchollet/keras (2015)
3. Depeweg, S., Rothkopf, C.A., Jäkel, F.: Solving bongard problems with a visual
language and pragmatic reasoning. arXiv preprint arXiv:1804.04452 (2018)
4. Fei-Fei, L., Fergus, R., Perona, P.: One-shot learning of ob-
ject categories. IEEE Trans. Pattern Anal. Mach. Intell. 28(4),
594–611 (Apr 2006). https://doi.org/10.1109/TPAMI.2006.79,
https://doi.org/10.1109/TPAMI.2006.79
5. Foundalis, H.: Phaeaco: A cognitive architecture inspired by bongard’s problems
[ph. d. thesis]. Indiana University, Indiana, Bloomington (2006)
6. Hofstadter, D.R.: Gödel, escher, bach. Vintage Books New York (1980)
7. Kharagorgiev, S.: Solving bongard problems with deep learning (Feb
2018), https://k10v.github.io/2018/02/25/Solving-Bongard-problems-with-deep-
learning/
8. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. CoRR
abs/1412.6980 (2014), http://arxiv.org/abs/1412.6980
9. Lake, B., Salakhutdinov, R., Gross, J., Tenenbaum, J.: One shot learning of simple
visual concepts. In: Proceedings of the Annual Meeting of the Cognitive Science
Society. vol. 33 (2011)
10. Lin, M., Chen, Q., Yan, S.: Network in network. arXiv preprint arXiv:1312.4400
(2013)
11. Raven, J.C., et al.: Raven’s progressive matrices. Western Psychological Services
Los Angeles, CA (1938)
12. Santoro, A., Hill, F., Barrett, D., Morcos, A., Lillicrap, T.: Measuring abstract
reasoning in neural networks. In: International Conference on Machine Learning.
pp. 4477–4486 (2018)
13. Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional net-
works: Visualising image classification models and saliency maps. arXiv preprint
arXiv:1312.6034 (2013)
14. Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al.: Matching networks
for one shot learning. In: Advances in Neural Information Processing Systems. pp.
3630–3638 (2016)
15. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In:
Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision – ECCV
2014. pp. 818–833. Springer International Publishing, Cham (2014)