24th International Conference on Business Information Systems, 14-17 June 2021, Hannover, Germany
Session: Knowledge Graphs
https://doi.or/10........ DOI placeholder (WILL BE FILLED IN BY TIB Open Publishing)
Pre-print, accepted for publication at the 24th International Conference on Business Information Systems
Mapping of ImageNet and Wikidata
for Knowledge Graphs Enabled Computer Vision
Dominik Filipiak1[https://orcid.org/0000-0002-4927-9992], Anna Fensel1,2[https://orcid.org/0000-0002-1391-7104], and
1Semantic Technology Institute (STI) Innsbruck, Department of Computer Science, University of Innsbruck, Austria
2Wageningen University & Research, The Netherlands
3Department of Information Systems, Pozna´
n University of Economics and Business, Poland
Abstract. Knowledge graphs are used as a source of prior knowledge in numerous computer
vision tasks. However, such an approach requires to have a mapping between ground truth
data labels and the target knowledge graph. We linked the ILSVRC 2012 dataset (often
simply referred to as ImageNet) labels to Wikidata entities. This enables using rich knowl-
edge graph structure and contextual information for several computer vision tasks, traditionally
benchmarked with ImageNet and its variations. For instance, in few-shot learning classiﬁca-
tion scenarios with neural networks, this mapping can be leveraged for weight initialisation,
which can improve the ﬁnal performance metrics value. We mapped all 1000 ImageNet la-
bels – 461 were already directly linked with the exact match property (P2888), 467 have exact
match candidates, and 72 cannot be matched directly. For these 72 labels, we discuss dif-
ferent problem categories stemming from the inability of ﬁnding an exact match. Semantically
close non-exact match candidates are presented as well. The mapping is publicly available at
Keywords: ImageNet, Wikidata, mapping, computer vision, knowledge graphs
Thanks to deep learning and convolutional neural networks, the ﬁeld of computer vision experi-
enced rapid development in recent years. ImageNet (ILSVRC 2012) is one of the most popular
datasets used for training and benchmarking models in the classiﬁcation task for computer vi-
sion. Nowadays, an intense effort can be observed in the domain of few-  or zero-shot
learning  which copes with various machine learning tasks, for which training data is very
scarce or even non-available. More formally, N-way K-shot learning considers a setting, in
which there are Ncategories with Ksamples to learn from (typically K≤20 in few-shot learn-
ing). This is substantially harder from standard settings, as deep learning models usually rely
on a large number of samples provided. One of the approaches to few-shot learning considers
relying on some prior knowledge, such as the class label. This can be leveraged to improve
the performance of the task. For instance, Chen et al.  presented Knowledge Graph Trans-
fer Network, which uses the adjacency matrix built from knowledge graph correlations in order
to create class prototypes in a few-shot learning classiﬁcation. More generally, knowledge-
embedded machine learning systems can use knowledge graphs as a source of information for
improving performance metrics for a given task. One of these knowledge graphs is Wikidata
, a popular collaborative knowledge graph.
Our main research goal concentrates on facilitating general-purpose knowledge graphs en-
abled computer vision methods, such as the aforementioned knowledge graph transfer network.
In this paper, we provide a mapping between ImageNet classes and Wikidata entities, as this is
the ﬁrst step to achieve this goal. Our paper is inspired by and built on top of the work of Nielsen
 – he ﬁrst explored the possibility of linking ImageNet WordNet synsets with Wikidata. We
also aim at providing detailed explanations for our choices and compare the results with these
provided by Nielsen. Our publicly available mapping links WordNet synset used as ImageNet
labels with Wikidata entities. It will be useful for the aforementioned computer vision tasks.
Practical usage scenarios consider situations in which labelling data is a costly process and the
considered classes can be linked to a given graph (that is, for few- or zero-shot learning tasks).
However, simpler tasks, such as classiﬁcation, can also use context knowledge stemming from
rich knowledge graph structure (in prototype learning , for instance).
The remainder of this paper is structured as follows. In the next section, we brieﬂy discuss
related work. Then, in the third section, we provide detailed explanations about the mapping
process, which is focused on the entities which do not have a perfect match candidate. The
next section provides some analytics describing the mapping, as well as a comparison with
automated matching using a NERD tool – entity-ﬁshing . The paper is concluded with a
summary. Most importantly, the mapping is publicly available1.
Background and Related Work
To provide a mapping between ILSVRC 2012 and Wikidata, it is necessary to deﬁne some
related terms ﬁrst. This requires introducing a few additional topics, such as WordNet, since
some concepts (such as structure) in ILSVRC 2012 are based on the former. This section
provides a comprehensive overview of these concepts. We also enlist the existing literature on
the same problem of ﬁnding this speciﬁc mapping. To the best of our knowledge, there were
only two attempts to achieve this goal – both are described below.
WordNet is a large lexical database of numerous (primarily) English words . Nouns
and verbs have a hierarchical structure and they are grouped altogether as synsets (sets of
synonyms) in WordNet. Historically, this database paid a signiﬁcant role in various pre-deep
learning era artiﬁcial intelligence applications (it is still used nowadays, though). ImageNet 
is a large image database, which inherited its hierarchical structure from WordNet. It contains
14,197,122 images and 21841 WordNet-based synsets at the time of writing, which makes it
an important source of ground-truth data for computer vision. ImageNet Large Scale Visual
Recognition Challenge (abbreviated as ILSVRC)  was an annual competition for computer
vision researchers. The datasets released each year (subsets of original ImageNet) form a pop-
ular benchmark for various tasks to this day. The one released at ILSVRC 2012 is particularly
popular and commonly called ImageNet2up to this date. It gained scientiﬁc attention due to the
winning architecture AlexNet , which greatly helped to popularise deep learning. ImageNet is
an extremely versatile dataset – architectures coping with it usually have been successful with
different datasets as well . Models trained on ImageNet are widely used for transfer learning
Launched in 2012, Wikidata  is a collaborative knowledge graph hosted by Wikimedia
Foundation. It provides a convenient SPARQL endpoint. To this date, it is an active project
and it is an important source of information for e.g. Wikipedia articles. Due to its popular-
ity, size, and ubiquity, Wikidata can be considered as one of the most popular and successful
knowledge graph instances along with DBpedia  and Freebase-powered  Google Knowl-
edge graph. Given the recent interest in the ability to leverage external knowledge in computer
vision tasks , it would be therefore beneﬁcial to map ImageNet classes to the correspond-
2From now on, we will refer to ILSVRC 2012 dataset as simply ImageNet, unless directly stated otherwise.
ing Wikidata entities. The idea itself is not new, though the full mapping was not available to
this date. To the best of our knowledge, Nielsen  was the ﬁrst to tackle this problem. He
summarised the encountered problems during preparing the mapping and classiﬁed them into
few categories. These categories include missing synsets on Wikidata, matching with a dis-
ambiguation page, discrepancies between ImageNet and WordNet, differences between the
WordNet and the Wikidata concepts with similar names, and multiple semantically similar items
in WordNet and Wikidata. Nielsen described his effort in detail, though the full mapping was
not published. Independently, Edwards  tried to map DBpedia and Wikidata to ImageNet (in
a larger sense, not ILSVRC 2012) using various pre-existing mappings and knowledge graph
embeddings methods, such as TransE , though the results of such mapping have not been
published as well. Contrary to these papers, we publish our mapping.
This section is devoted to the mapping between the ImageNet dataset and Wikidata. First,
we explain our approach in order to provide such mapping. Then, we identify and group key
issues, which occurred in the mapping process. We also provide more detailed explanations
for the most problematic entities.
To prepare the mapping, we follow the approach and convention presented by Nielsen.
Namely, we use synset names from WordNet 3.0 (as opposed to, for example, WordNet 3.1).
That is, we ﬁrst check the skos:exactMatch (P2888) property in terms of an existing mapping
between Wikidata entities and WordNet synsets. This has to be done per every ImageNet
class. For example, for the ImageNet synset n02480855 we search for P2888 equal to http:
//wordnet-rdf.princeton.edu/wn30/02480855-n using Wikidata SPARQL endpoint. Listing
1provides a SPARQL query for this example.
3?item wdt:P2888 ?uri.
Listing 1. Matching WordNet with Wikidata entities using SPARQL.
As of November 2020, there are 461 already linked synsets out of 1000 in ImageNet using
wdt:P2888 property. For the rest, the mapping has to be provided. Unlike Edwards , we
do not rely on automated methods, since the remaining 539 entities can be checked by hand
(although we test one of them in the next section). Using manual search on Google Search, we
found good skos:exactMatch candidates for the next 467 ImageNet classes. These matches
can be considered to be added directly to Wikidata, as they directly reﬂect the same concept.
For the vast majority of the cases, a simple heuristics was enough – one has to type the synset
name in the search engine, check the ﬁrst result on Wikipedia, evaluate its ﬁtness and then use
its Wikidata item link. Using this method, one can link 928 classes in total (with 467 entities
matched by hand).
Sometimes, two similar concepts were yielded. Such synsets were a subject of qualitative
analysis, which aimed at providing the best match. Similarly, sometimes there is no good
match at all. At this stage, 72 out of 1000 classes remain unmatched. Here, we enlist our
propositions for them. We encountered problems similar to Nielsen , though we propose a
different summary of common issues. We categorised these to the following category problems:
Table 1. Mapping – hyponymy.
WordNet 3.0 synset Wikidata Entity
n03706229 (magnetic compass)Q34735 (compass)
n02276258 (admiral)Q311218 (vanessa)
n03538406 (horse cart )Q235356 (carriage)
n03976467 (Polaroid camera, Polaroid Land camera)Q313695 (instant camera)
n03775071 (mitten)Q169031 (glove)
n02123159 (tiger cat)Q1474329 (tabby cat)
n03796401 (moving van)Q193468 (van)
n04579145 (whisky jug)Q2413314 (jug)
n09332890 (lakeshore)Q468756 (shore)
n01685808 (whiptail, whiptail lizard)Q1004429 (Cnemidophorus)
n03223299 (doormat)Q1136834 (mat )
n12768682 (buckeye, horse chestnut, conker)Q11009 (nut)
n03134739 (croquet ball)Q18545 (ball)
n09193705 (alp)Q8502 (mountain)
n03891251 (park bench)Q204776 (bench)
n02276258 (admiral)Q311218 (vanessa)
hyponymy,animals and their size, age, and sex,ambiguous synsets, and non-exact match.
Each of these represents a different form of a trade-off made in order to provide the full mapping.
This is not a classiﬁcation in a strict sense, as some of the cases could be placed in several of
the aforementioned groups.
Hyponymy. This is the situation in which the level of granularity of WordNet synsets did not
match the one from Wikidata. As a consequence, some terms did not have a dedicated entity.
Therefore, we performed semantic inclusion, in which we searched for a more general “parent”
entity, which contained this speciﬁc case. Examples include magnetic compass (extended to
compass), mitten (glove), or whisky jug (jug). The cases falling to this category are presented
in Table 1.
Animals and their size, age, and sex. This set of patterns is actually a subcategory of
the hyponymy, but these animal-related nouns provided several problems worth distinguishing.
The ﬁrst one considers a situation in which a WordNet synset describes the particular sex of
a given animal. This information is often missing on Wikidata, which means that the broader
semantic meaning has to be used. For example, drake was mapped to duck, whereas ram, tup
to sheep. However, while hen was linked to chicken, for cock, rooster (n01514668) there exist
an exact match (Q2216236). Another pattern considers distinguishing animals of different age
and size. For instance, lorikeet in WordNet is deﬁned as “any of various small lories”. As this
deﬁnition is a bit imprecise, we decided to use loriini. In another example eft (juvenile newt) was
linked to newt. Similarly, there is eastern and western green mamba, but WordNet deﬁnes it as
“the green phase of the black mamba”. The breed of poodle has three varieties (toy,miniature,
and standard poodle), but Wikidata does not distinguish the difference between them – all were
therefore linked to poodle (Q38904). These mappings are summarised in Table 2.
Ambiguous synsets. This is a situation in which a set of synonyms does not necessarily
consist of synonyms (at least in terms of Wikidata entities). That is, for a synset containing at
least two synonyms, there is at least one possible Wikidata entity. At the same time, the broader
term for a semantic extension does not necessarily exist, since these concepts can be mutually
exclusive. For instance, for the synset African chameleon, Chamaeleo chamaeleon there exist
two candidates on Wikidata, Chamaeleo chamaeleon and Chamaelo africanus. We choose
the ﬁrst one due to the WordNet deﬁnition – “a chameleon found in Africa”. Another synset,
Table 2. Mapping – animals and their size, age, and sex.
WordNet 3.0 synset Wikidata Entity
n01847000 (drake)Q3736439 (duck)
n01514859 (hen)Q780 (chicken)
n01806143 (peacock)Q201251 (peafowl)
n02412080 (ram, tup)Q7368 (sheep)
n01820546 (lorikeet)Q15274050 (loriini)
n01631663 (eft)Q10980893 (newt)
n01749939 (green mamba)Q194425 (mamba)
n02113624 (toy poodle)Q38904 (poodle)
n02113712 (miniature poodle)Q38904 (poodle)
n02113799 (standard poodle)Q38904 (poodle)
n02087046 (toy terrier)Q37782 (English Toy Terrier)
academic gown, academic robe, judge’s robe contains at least two quite different notions –
we have chosen academic dress, as this meaning seems to be dominant in the ImageNet.
Harvester, reaper is an imprecise category in ImageNet since it offers a variety of agricultural
tools, not only these suggested by the synset name. Bonnet, poke bonnet has a match at
Wikidata’s bonnet, though it is worth noticing that ImageNet is focused on babies wearing this
speciﬁc headwear. The mapping of this category can be found in Table 3.
Non-exact match. Sometimes, however, there is no good exact match for a given synset
among Wikidata entities. At the same time, the broader term might be too broad. This leads to
unavoidable inaccuracies. For example, for nipple we have chosen its meronym, baby bottle.
Interestingly, nipple exists in Polish Wikidata, though it does not have any properties, which
makes it useless in further applications. Other examples involve somewhat similar meaning
–tile roof was mapped to roof tile, or steel arch bridge to through arch bridge.Plate rack
was linked to dish drying cabinet, though it is not entirely accurate, as the ImageNet contains
pictures of things not designated to drying, but sometimes for dish representation. In other
example, we map baseball player to Category:baseball players. ImageNet contains photos
of different kinds of stemware, not only goblet.Cassette was linked to a more ﬁne-grained
synset (audio cassette) as the images present audio cassettes in different settings. Table 4
summarises the mappings falling into this category.
ImageNet itself is not free from errors, since it is biased towards certain skin colour, gender,
or age. This is a great concern for ethical artiﬁcial intelligence scientists since models trained
on ImageNet are ubiquitous. There are some ongoing efforts to ﬁx it with a more balanced
set of images, though . Beyer et al.  enlisted numerous problems with ImageNet, such
as single pair per image, restrictive annotation process, or practically duplicate classes. They
proposed a set of new, more realistic labels (ReaL) and argued that models trained in such a
setting achieve better performance. Even given these drawbacks, ImageNet is still ubiquitous.
Naturally, the presented mapping inherits problems presented in ImageNet, such as these in
which images roughly do not present what the synset name suggests. This problem was pre-
viously reported by Nielsen  – he described it as a discrepancy between ImageNet and
WordNet. As for some examples, this might include radiator, which in ImageNet represents
home radiator, whereas the deﬁnition on Wikidata for the same name describes a bit more
broad notion (for instance, it also includes car radiators). Monitor is a similar example since it
might be any display device, though in ImageNet it is connected mostly to a computer display.
Sunscreen, sunblock, sun blocker represent different photos of products and their appliance on
the human body, which look completely different and might be split into two distinct classes.
Table 3. Mapping – ambiguities.
WordNet 3.0 synset Wikidata Entity
n01694178 (African chameleon, Chamaeleo
Q810152 (Chamaeleo africanus)
n02669723 (academic gown, academic robe, judge’s
Q1349227 (academic dress)
n02894605 (breakwater, groin, groyne, mole, bul-
wark, seawall, jetty)
n01755581 (diamondback, diamondback rattlesnake,
Q744532 (eastern diamondback rat-
n03110669 (cornet, horn, trumpet, trump)Q202027 (cornet )
n03899768 (patio, terrace)Q737988 (patio)
n04258138 (solar dish, solar collector, solar furnace)Q837515 (solar collector)
n03016953 (chiffonier, commode)Q2746233 (chiffonier)
n02114548 (white wolf, Arctic wolf, Canis lupus tun-
Q216441 (Arctic wolf )
n13133613 (Ear, spike, capitulum)Q587369 (Pseudanthium)
n04509417 (unicycle, monocycle)Q223924 (unicycle)
n01729322 (hognose snake, puff adder, sand viper)Q5877356 (hognose)
n01735189 (garter snake, grass snake)Q1149509 (garter snake)
n02017213 (European gallinule, Porphyrio porphyrio)Q187902 (Porphyrio porphyrio)
n02013706 (limpkin, aramus pictus)Q725276 (limpkin)
n04008634 (projectile, missile)Q49393 (projectile)
n09399592 (promontory, headland, head, foreland)Q1245089 (promontory )
n01644900 (tailed frog, bell toad, ribbed toad, tailed
toad, Ascaphus trui)
Q2925426 (tailed frog)
n02395406 (hog, pig, grunter, squealer, Sus scrofa)Q787 (pig)
n02443114 (polecat, ﬁtch, foulmart, foumart, Mustela
Q26582 (Mustela putorius)
n03017168 (chime, bell, gong)Q101401 (bell )
n02088466 (bloodhound, sleuthhound)Q21098 (bloodhound)
n03595614 (jersey, t-shirt)Q131151 (t-shirt )
n03065424 (coil, spiral, volute, whorl, helix)Q189114 (spiral )
n03594945 (jeep, land rover)Q946596 (off-road vehicle)
n01753488 (horned viper, cerastes, sand viper,
horned asp, Cerastes cornutus)
Q1476343 (Cerastes cerastes)
n03496892 (harvester, reaper)Q1367947 (reaper )
n02869837 (bonnet, poke bonnet)Q1149531 (bonnet)
Table 4. Mapping – non-exact matches.
WordNet 3.0 synset Wikidata Entity
n04311004 (steel arch bridge)Q5592057 (through arch bridge)
n01737021 (water snake)Q2163958 (common water snake)
n07714571 (head cabbage)Q35051 (white cabbage)
n01871265 (tusker)Q7378 (elephant)
n04344873 (studio coach, day bed)Q19953097 (sofa bed )
n07714571 (head cabbage)Q35051 ((white) cabbage)
n03961711 (plate rack)Q1469010 (dish drying cabinet)
n04505470 (typewriter keyboard)Q46335 (typewriter )
n03825788 (nipple)Q797906 (baby bottle)
n04435653 (tile roof )Q268547 (roof tile)
n09835506 (baseball player)Q7217606 (Category:baseball players)
n02966687 (carpenter’s kit, tool kit)Q1501161 (toolbox)
n02860847 (bobsleigh – sleigh) Q177275 (bobsleigh – sport)
n04493381 (tub, vat)Q152095 (bathtub)
n03443371 (goblet)Q14920412 (stemware)
n02978881 (cassette)Q149757 (audio cassette)
We also check to what extent the process can be automated, as it might be useful for larger
subsets of ImageNet (in a broad sense). In this section, we present the results of such an
investigation. We also provide a concise analysis of the number of direct properties, which is a
crucial feature in spite of the future usage of the mapping in various computer vision settings.
Foppiano and Romary developed entity-ﬁshing , a tool for named entity recognition and
disambiguation (abbreviated as NERD). This tool can be employed in order to provide an auto-
matic mapping between ImageNet and Wikidata. We used indexed data built from the Wikidata
and Wikipedia dumps from 20.05.2020. For this experiment, each synset is split on commas.
For example, a synset barn spider, Araneus cavaticus (n01773549) is split into two synset el-
ements: barn spider and Araneus cavaticus. For each of these elements, the term lookup
service from entity-ﬁshing is called, which searches the knowledge base for given terms in
order to provide match candidates. Since this service provides a list of entities ranked by its
conditional probability, we choose the one with the highest value.
We start with the 461 already linked instances, which can be perceived as ground-truth data
for this experiment. Among them, for 387 (84%) synset elements there was at least one correct
suggestion (for example, for a previously mentioned synset barn spider and Araneus cavati-
cus at least one was matched to Q1306991). In particular, 286 (62%) synsets were correctly
matched for all its elements (for example, for a previously mentioned synset barn spider and
Araneus cavaticus were both matched to Q1306991). While these results show that NERD tools
can speed up the process of linking by narrowing down the number of entities to be searched
for in some cases, it does not replace manual mapping completely – especially in more com-
plicated and ambiguous cases, which were mentioned in the previous section. Nevertheless,
for the remaining 539 synsets which were manually linked, an identical NERD experiment has
been performed, which resulted in similar ﬁgures. For 448 (83%) synsets, entity-ﬁshing pro-
vided the same match for at least one synset element. Similarly, for 342 synsets (63%) the tool
yielded the same match for all elements. Albeit these ﬁgures can be considered as relatively
not low, they prove that the mapping obtained in such a way might consider some discrepancies
and justify the process presented in the previous section.
Table 5. Most popular properties in the mapping (occurrences of the same properties of a given
entity was counted as one).
property label count
P646 (Freebase ID) 932
P373 (Commons category ) 927
P18 (image) 911
P8408 (KBpedia ID) 687
P910 (topic’s main category) 681
P279 (subclass of ) 659
P1417 (Encyclopædia Britannica Online ID) 618
P8814 (WordNet 3.1 Synset ID) 551
P31 (instance of ) 524
P1014 (Art & Architecture Thesaurus ID) 482
Similarly to Nielsen, we also count the number of the direct properties available in Wikidata.
This is a crucial feature since it enables to leverage knowledge graph structure. Listing 2shows
the query used for obtaining the number of properties for Q29022. The query was repeated for
each mapped entity. Figure 1depicts a histogram of direct properties for the 1000 mapped
classes. This histogram presents the right-skewed distribution (normal-like after taking the
natural logarithm) with the mean at 28.28 (σ= 22.77). Only one entity has zero properties (wall
In total, there are 992 Wikidata entities used in the mapping, as some of them were used
several times, like the mentioned poodle. These entities displayed 626 unique properties in to-
tal. The most popular ones are listed in Table 5. In the optics of computer vision, an important
subset of these categories includes P373 (Commons category )P910 (topic’s main category),
and P279 (subclass of), as they directly reﬂect hyponymy and hypernymy with the knowledge
graph. Such information can be later leveraged in the process of detecting (dis-)similar nodes in
a direct way. For example, using e.g. graph path distance in SPARQL for entities sharing com-
mon ancestors considering a given property. However, SPARQL does not allow to count the
number of arbitrary properties between two given entities. Using graph embedding is a potential
workaround for this issue. For example, one can calculate the distances from 200-dimensional
pre-trained embeddings provided by the PyTorch-BigGraph library . Another possible direc-
tion considers leveraging other linked knowledge graphs, such as Freebase (P646), which is
linked to the majority of considered instances.
1SELECT (COUNT (?property)AS ?count)
3wd:Q29022 ?property  .
Listing 2. Counting direct properties for a single mapped entity. Based on: Nielsen .
In this paper, we presented a complete mapping of ILSVRC 2012 synsets and Wikidata. For
461 classes, such a mapping already existed in Wikidata with skos:exactMatch. For other 467
0 25 50 75 100 125 150 175
Number of properties
Figure 1. A histogram of direct properties.
classes, we found candidates, which match their corresponding synset. Since 72 classes do not
have a direct match, we proposed a detailed justiﬁcation of our choices. We also compared our
mapping with the one obtained from an automated process. To the best of our knowledge, we
are the ﬁrst to publish the mapping ImageNet and Wikidata. The mapping is publicly available
for use and validation in various computer vision scenarios.
Future work should focus on empirically testing the mapping. Our results are intended to
be beneﬁcial for general-purpose computer vision research since the graphs can be leveraged
as a source of contextual information for various tasks, as our analysis showed that the vast
majority of the linked entities have a certain number of direct properties. This fact can be
utilised according to the given computer vision task. For example, it may be used to generate
low-level entity (label) embeddings and calculate distances between them in order to create a
correlation matrix used in Knowledge Graph Transfer Network  in the task of few-shot image
classiﬁcation. This architecture leverages prior knowledge regarding the semantic similarity
of considered labels (called correlation matrix in the paper), which are used for creating class
prototypes. These prototypes are used to help the classiﬁer learn novel categories with only few
samples available. Correlations might be calculated using simple graph path distance, as well
as using more sophisticated low-dimensional knowledge graph embeddings and some distance
metrics between each instance. In this case, this will result in a 1000×1000 matrix, as there
are 1000 labels in ImageNet. Embeddings from pre-trained models might be used for this task
(such as the aforementioned PyTorch-BigGraph embeddings).
Future work might also consider extending the mapping in a way that allows considering
larger subsets of ImageNet (in a broad sense), such as ImageNet-6K , the dataset, which
consists of 6000 ImageNet categories. Preparation of such a large mapping might require a
more systematic and collaboratively-oriented approach, which can help to create, verify and
reuse the results . The presented approach can also be used for providing mappings with
other knowledge graphs and ImageNet. Another possible application might consider further
mapping to the actions, which might be particularly interesting for applications in robotics, where
robots would be deciding which actions to take based on such mappings .
CRediT – Contributor Roles Taxonomy
Dominik Filipiak: conceptualization, data curation, formal analysis, investigation, methodol-
ogy, software, validation, writing – original draft. Anna Fensel: conceptualization, funding
acquisition, project administration, writing – review & editing, validation, resources. Agata Fil-
ipowska: writing – review & editing, validation, resources.
This research was co-funded by Interreg ¨
Osterreich-Bayern 2014-2020 programme project KI-
Net: Bausteine f ¨
ur KI-basierte Optimierungen in der industriellen Fertigung (grant agreement:
 S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A nucleus
for a web of open data. In The semantic web, pages 722–735. Springer, 2007.
 L. Beyer, O. J. H´
enaff, A. Kolesnikov, X. Zhai, and A. v. d. Oord. Are we done with ima-
genet? arXiv preprint arXiv:2006.07159, 2020.
 A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko. Translating em-
beddings for modeling multi-relational data. Advances in neural information processing
systems, 26:2787–2795, 2013.
 R. Chen, T. Chen, X. Hui, H. Wu, G. Li, and L. Lin. Knowledge graph transfer network for
few-shot recognition. In AAAI, pages 10575–10582, 2020.
 J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale
hierarchical image database. In 2009 IEEE conference on computer vision and pattern
recognition, pages 248–255. Ieee, 2009.
 C. Edwards. Linking knowledge graphs and images using embeddings. https://
 L. Foppiano and L. Romary. entity-ﬁshing: a dariah entity recognition and disambiguation
service. Journal of the Japanese Association for Digital Humanities, 5(1):22–60, 2020.
 M. Huh, P. Agrawal, and A. A. Efros. What makes imagenet good for transfer learning?
arXiv preprint arXiv:1608.08614, 2016.
 A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classiﬁcation with deep convolu-
tional neural networks. Communications of the ACM, 60(6):84–90, 2017.
 A. Lerer, L. Wu, J. Shen, T. Lacroix, L. Wehrstedt, A. Bose, and A. Peysakhovich. PyTorch-
BigGraph: A Large-scale Graph Embedding System. In Proceedings of the 2nd SysML
Conference, Palo Alto, CA, USA, 2019.
 G. A. Miller. Wordnet: a lexical database for english. Communications of the ACM,
 F. ˚
A. Nielsen. Linking imagenet wordnet synsets with wikidata. In Companion Proceedings
of the The Web Conference 2018, pages 1809–1814, 2018.
 O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy,
A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. Interna-
tional journal of computer vision, 115(3):211–252, 2015.
 I. Stavrakantonakis, A. Fensel, and D. Fensel. Matching web entities with potential actions.
In SEMANTICS (Posters & Demos), pages 35–38. Citeseer, 2014.
 D. Vrandeˇ
c and M. Kr¨
otzsch. Wikidata: a free collaborative knowledgebase. Communi-
cations of the ACM, 57(10):78–85, 2014.
 W. Wang, V. W. Zheng, H. Yu, and C. Miao. A survey of zero-shot learning: Settings,
methods, and applications. ACM Transactions on Intelligent Systems and Technology
(TIST), 10(2):1–37, 2019.
 Y. Wang, Q. Yao, J. T. Kwok, and L. M. Ni. Generalizing from a few examples: A survey on
few-shot learning. ACM Computing Surveys (CSUR), 53(3):1–34, 2020.
 H.-M. Yang, X.-Y. Zhang, F. Yin, and C.-L. Liu. Robust classiﬁcation with convolutional pro-
totype learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 3474–3482, 2018.
 K. Yang, K. Qinami, L. Fei-Fei, J. Deng, and O. Russakovsky. Towards fairer datasets:
Filtering and balancing the distribution of the people subtree in the imagenet hierarchy.
In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency,
pages 547–558, 2020.
 A. V. Zhdanova and P. Shvaiko. Community-driven ontology matching. In European Se-
mantic Web Conference, pages 34–49. Springer, 2006.