PreprintPDF Available

Abstract and Figures

Knowledge graphs are used as a source of prior knowledge in numerous computer vision tasks. However, such an approach requires to have a mapping between ground truth data labels and the target knowledge graph. We linked the ILSVRC 2012 dataset (often simply referred to as ImageNet) labels to Wikidata entities. This enables using rich knowledge graph structure and contextual information for several computer vision tasks, traditionally benchmarked with ImageNet and its variations. For instance, in few-shot learning classification scenarios with neural networks, this mapping can be leveraged for weight initialisation, which can improve the final performance metrics value. We mapped all 1000 ImageNet labels -- 461 were already directly linked with the exact match property (P2888), 467 have exact match candidates, and 72 cannot be matched directly. For these 72 labels, we discuss different problem categories stemming from the inability of finding an exact match. Semantically close non-exact match candidates are presented as well. The mapping is publicly available at https://github.com/DominikFilipiak/imagenet-to-wikidata-mapping.
Content may be subject to copyright.
24th International Conference on Business Information Systems, 14-17 June 2021, Hannover, Germany
Session: Knowledge Graphs
https://doi.or/10........ DOI placeholder (WILL BE FILLED IN BY TIB Open Publishing)
© Authors
Pre-print, accepted for publication at the 24th International Conference on Business Information Systems
Mapping of ImageNet and Wikidata
for Knowledge Graphs Enabled Computer Vision
Dominik Filipiak1[https://orcid.org/0000-0002-4927-9992], Anna Fensel1,2[https://orcid.org/0000-0002-1391-7104], and
Agata Filipowska3[https://orcid.org/0000-0002-8425-1872]
1Semantic Technology Institute (STI) Innsbruck, Department of Computer Science, University of Innsbruck, Austria
2Wageningen University & Research, The Netherlands
3Department of Information Systems, Pozna´
n University of Economics and Business, Poland
Abstract. Knowledge graphs are used as a source of prior knowledge in numerous computer
vision tasks. However, such an approach requires to have a mapping between ground truth
data labels and the target knowledge graph. We linked the ILSVRC 2012 dataset (often
simply referred to as ImageNet) labels to Wikidata entities. This enables using rich knowl-
edge graph structure and contextual information for several computer vision tasks, traditionally
benchmarked with ImageNet and its variations. For instance, in few-shot learning classifica-
tion scenarios with neural networks, this mapping can be leveraged for weight initialisation,
which can improve the final performance metrics value. We mapped all 1000 ImageNet la-
bels – 461 were already directly linked with the exact match property (P2888), 467 have exact
match candidates, and 72 cannot be matched directly. For these 72 labels, we discuss dif-
ferent problem categories stemming from the inability of finding an exact match. Semantically
close non-exact match candidates are presented as well. The mapping is publicly available at
https://github.com/DominikFilipiak/imagenet-to-wikidata-mapping.
Keywords: ImageNet, Wikidata, mapping, computer vision, knowledge graphs
Introduction
Thanks to deep learning and convolutional neural networks, the field of computer vision experi-
enced rapid development in recent years. ImageNet (ILSVRC 2012) is one of the most popular
datasets used for training and benchmarking models in the classification task for computer vi-
sion. Nowadays, an intense effort can be observed in the domain of few- [17] or zero-shot
learning [16] which copes with various machine learning tasks, for which training data is very
scarce or even non-available. More formally, N-way K-shot learning considers a setting, in
which there are Ncategories with Ksamples to learn from (typically K20 in few-shot learn-
ing). This is substantially harder from standard settings, as deep learning models usually rely
on a large number of samples provided. One of the approaches to few-shot learning considers
relying on some prior knowledge, such as the class label. This can be leveraged to improve
the performance of the task. For instance, Chen et al. [4] presented Knowledge Graph Trans-
fer Network, which uses the adjacency matrix built from knowledge graph correlations in order
to create class prototypes in a few-shot learning classification. More generally, knowledge-
embedded machine learning systems can use knowledge graphs as a source of information for
improving performance metrics for a given task. One of these knowledge graphs is Wikidata
[15], a popular collaborative knowledge graph.
Our main research goal concentrates on facilitating general-purpose knowledge graphs en-
abled computer vision methods, such as the aforementioned knowledge graph transfer network.
In this paper, we provide a mapping between ImageNet classes and Wikidata entities, as this is
the first step to achieve this goal. Our paper is inspired by and built on top of the work of Nielsen
[12] – he first explored the possibility of linking ImageNet WordNet synsets with Wikidata. We
also aim at providing detailed explanations for our choices and compare the results with these
provided by Nielsen. Our publicly available mapping links WordNet synset used as ImageNet
labels with Wikidata entities. It will be useful for the aforementioned computer vision tasks.
Practical usage scenarios consider situations in which labelling data is a costly process and the
considered classes can be linked to a given graph (that is, for few- or zero-shot learning tasks).
However, simpler tasks, such as classification, can also use context knowledge stemming from
rich knowledge graph structure (in prototype learning [18], for instance).
The remainder of this paper is structured as follows. In the next section, we briefly discuss
related work. Then, in the third section, we provide detailed explanations about the mapping
process, which is focused on the entities which do not have a perfect match candidate. The
next section provides some analytics describing the mapping, as well as a comparison with
automated matching using a NERD tool – entity-fishing [7]. The paper is concluded with a
summary. Most importantly, the mapping is publicly available1.
Background and Related Work
To provide a mapping between ILSVRC 2012 and Wikidata, it is necessary to define some
related terms first. This requires introducing a few additional topics, such as WordNet, since
some concepts (such as structure) in ILSVRC 2012 are based on the former. This section
provides a comprehensive overview of these concepts. We also enlist the existing literature on
the same problem of finding this specific mapping. To the best of our knowledge, there were
only two attempts to achieve this goal – both are described below.
WordNet is a large lexical database of numerous (primarily) English words [11]. Nouns
and verbs have a hierarchical structure and they are grouped altogether as synsets (sets of
synonyms) in WordNet. Historically, this database paid a significant role in various pre-deep
learning era artificial intelligence applications (it is still used nowadays, though). ImageNet [5]
is a large image database, which inherited its hierarchical structure from WordNet. It contains
14,197,122 images and 21841 WordNet-based synsets at the time of writing, which makes it
an important source of ground-truth data for computer vision. ImageNet Large Scale Visual
Recognition Challenge (abbreviated as ILSVRC) [13] was an annual competition for computer
vision researchers. The datasets released each year (subsets of original ImageNet) form a pop-
ular benchmark for various tasks to this day. The one released at ILSVRC 2012 is particularly
popular and commonly called ImageNet2up to this date. It gained scientific attention due to the
winning architecture AlexNet [9], which greatly helped to popularise deep learning. ImageNet is
an extremely versatile dataset – architectures coping with it usually have been successful with
different datasets as well [2]. Models trained on ImageNet are widely used for transfer learning
purposes [8].
Launched in 2012, Wikidata [15] is a collaborative knowledge graph hosted by Wikimedia
Foundation. It provides a convenient SPARQL endpoint. To this date, it is an active project
and it is an important source of information for e.g. Wikipedia articles. Due to its popular-
ity, size, and ubiquity, Wikidata can be considered as one of the most popular and successful
knowledge graph instances along with DBpedia [1] and Freebase-powered [2] Google Knowl-
edge graph. Given the recent interest in the ability to leverage external knowledge in computer
vision tasks [4], it would be therefore beneficial to map ImageNet classes to the correspond-
1https://github.com/DominikFilipiak/imagenet-to-wikidata-mapping
2From now on, we will refer to ILSVRC 2012 dataset as simply ImageNet, unless directly stated otherwise.
ing Wikidata entities. The idea itself is not new, though the full mapping was not available to
this date. To the best of our knowledge, Nielsen [12] was the first to tackle this problem. He
summarised the encountered problems during preparing the mapping and classified them into
few categories. These categories include missing synsets on Wikidata, matching with a dis-
ambiguation page, discrepancies between ImageNet and WordNet, differences between the
WordNet and the Wikidata concepts with similar names, and multiple semantically similar items
in WordNet and Wikidata. Nielsen described his effort in detail, though the full mapping was
not published. Independently, Edwards [6] tried to map DBpedia and Wikidata to ImageNet (in
a larger sense, not ILSVRC 2012) using various pre-existing mappings and knowledge graph
embeddings methods, such as TransE [3], though the results of such mapping have not been
published as well. Contrary to these papers, we publish our mapping.
Mapping
This section is devoted to the mapping between the ImageNet dataset and Wikidata. First,
we explain our approach in order to provide such mapping. Then, we identify and group key
issues, which occurred in the mapping process. We also provide more detailed explanations
for the most problematic entities.
To prepare the mapping, we follow the approach and convention presented by Nielsen.
Namely, we use synset names from WordNet 3.0 (as opposed to, for example, WordNet 3.1).
That is, we first check the skos:exactMatch (P2888) property in terms of an existing mapping
between Wikidata entities and WordNet synsets. This has to be done per every ImageNet
class. For example, for the ImageNet synset n02480855 we search for P2888 equal to http:
//wordnet-rdf.princeton.edu/wn30/02480855-n using Wikidata SPARQL endpoint. Listing
1provides a SPARQL query for this example.
1SELECT *
2WHERE {
3?item wdt:P2888 ?uri.
4FILTER STRSTARTS(STR(?uri),
"http://wordnet-rdf.princeton.edu/wn30/02480855-n").,
5}
Listing 1. Matching WordNet with Wikidata entities using SPARQL.
As of November 2020, there are 461 already linked synsets out of 1000 in ImageNet using
wdt:P2888 property. For the rest, the mapping has to be provided. Unlike Edwards [6], we
do not rely on automated methods, since the remaining 539 entities can be checked by hand
(although we test one of them in the next section). Using manual search on Google Search, we
found good skos:exactMatch candidates for the next 467 ImageNet classes. These matches
can be considered to be added directly to Wikidata, as they directly reflect the same concept.
For the vast majority of the cases, a simple heuristics was enough – one has to type the synset
name in the search engine, check the first result on Wikipedia, evaluate its fitness and then use
its Wikidata item link. Using this method, one can link 928 classes in total (with 467 entities
matched by hand).
Sometimes, two similar concepts were yielded. Such synsets were a subject of qualitative
analysis, which aimed at providing the best match. Similarly, sometimes there is no good
match at all. At this stage, 72 out of 1000 classes remain unmatched. Here, we enlist our
propositions for them. We encountered problems similar to Nielsen [12], though we propose a
different summary of common issues. We categorised these to the following category problems:
Table 1. Mapping – hyponymy.
WordNet 3.0 synset Wikidata Entity
n03706229 (magnetic compass)Q34735 (compass)
n02276258 (admiral)Q311218 (vanessa)
n03538406 (horse cart )Q235356 (carriage)
n03976467 (Polaroid camera, Polaroid Land camera)Q313695 (instant camera)
n03775071 (mitten)Q169031 (glove)
n02123159 (tiger cat)Q1474329 (tabby cat)
n03796401 (moving van)Q193468 (van)
n04579145 (whisky jug)Q2413314 (jug)
n09332890 (lakeshore)Q468756 (shore)
n01685808 (whiptail, whiptail lizard)Q1004429 (Cnemidophorus)
n03223299 (doormat)Q1136834 (mat )
n12768682 (buckeye, horse chestnut, conker)Q11009 (nut)
n03134739 (croquet ball)Q18545 (ball)
n09193705 (alp)Q8502 (mountain)
n03891251 (park bench)Q204776 (bench)
n02276258 (admiral)Q311218 (vanessa)
hyponymy,animals and their size, age, and sex,ambiguous synsets, and non-exact match.
Each of these represents a different form of a trade-off made in order to provide the full mapping.
This is not a classification in a strict sense, as some of the cases could be placed in several of
the aforementioned groups.
Hyponymy. This is the situation in which the level of granularity of WordNet synsets did not
match the one from Wikidata. As a consequence, some terms did not have a dedicated entity.
Therefore, we performed semantic inclusion, in which we searched for a more general “parent”
entity, which contained this specific case. Examples include magnetic compass (extended to
compass), mitten (glove), or whisky jug (jug). The cases falling to this category are presented
in Table 1.
Animals and their size, age, and sex. This set of patterns is actually a subcategory of
the hyponymy, but these animal-related nouns provided several problems worth distinguishing.
The first one considers a situation in which a WordNet synset describes the particular sex of
a given animal. This information is often missing on Wikidata, which means that the broader
semantic meaning has to be used. For example, drake was mapped to duck, whereas ram, tup
to sheep. However, while hen was linked to chicken, for cock, rooster (n01514668) there exist
an exact match (Q2216236). Another pattern considers distinguishing animals of different age
and size. For instance, lorikeet in WordNet is defined as “any of various small lories”. As this
definition is a bit imprecise, we decided to use loriini. In another example eft (juvenile newt) was
linked to newt. Similarly, there is eastern and western green mamba, but WordNet defines it as
“the green phase of the black mamba”. The breed of poodle has three varieties (toy,miniature,
and standard poodle), but Wikidata does not distinguish the difference between them – all were
therefore linked to poodle (Q38904). These mappings are summarised in Table 2.
Ambiguous synsets. This is a situation in which a set of synonyms does not necessarily
consist of synonyms (at least in terms of Wikidata entities). That is, for a synset containing at
least two synonyms, there is at least one possible Wikidata entity. At the same time, the broader
term for a semantic extension does not necessarily exist, since these concepts can be mutually
exclusive. For instance, for the synset African chameleon, Chamaeleo chamaeleon there exist
two candidates on Wikidata, Chamaeleo chamaeleon and Chamaelo africanus. We choose
the first one due to the WordNet definition – “a chameleon found in Africa”. Another synset,
Table 2. Mapping – animals and their size, age, and sex.
WordNet 3.0 synset Wikidata Entity
n01847000 (drake)Q3736439 (duck)
n01514859 (hen)Q780 (chicken)
n01806143 (peacock)Q201251 (peafowl)
n02412080 (ram, tup)Q7368 (sheep)
n01820546 (lorikeet)Q15274050 (loriini)
n01631663 (eft)Q10980893 (newt)
n01749939 (green mamba)Q194425 (mamba)
n02113624 (toy poodle)Q38904 (poodle)
n02113712 (miniature poodle)Q38904 (poodle)
n02113799 (standard poodle)Q38904 (poodle)
n02087046 (toy terrier)Q37782 (English Toy Terrier)
academic gown, academic robe, judge’s robe contains at least two quite different notions –
we have chosen academic dress, as this meaning seems to be dominant in the ImageNet.
Harvester, reaper is an imprecise category in ImageNet since it offers a variety of agricultural
tools, not only these suggested by the synset name. Bonnet, poke bonnet has a match at
Wikidata’s bonnet, though it is worth noticing that ImageNet is focused on babies wearing this
specific headwear. The mapping of this category can be found in Table 3.
Non-exact match. Sometimes, however, there is no good exact match for a given synset
among Wikidata entities. At the same time, the broader term might be too broad. This leads to
unavoidable inaccuracies. For example, for nipple we have chosen its meronym, baby bottle.
Interestingly, nipple exists in Polish Wikidata, though it does not have any properties, which
makes it useless in further applications. Other examples involve somewhat similar meaning
tile roof was mapped to roof tile, or steel arch bridge to through arch bridge.Plate rack
was linked to dish drying cabinet, though it is not entirely accurate, as the ImageNet contains
pictures of things not designated to drying, but sometimes for dish representation. In other
example, we map baseball player to Category:baseball players. ImageNet contains photos
of different kinds of stemware, not only goblet.Cassette was linked to a more fine-grained
synset (audio cassette) as the images present audio cassettes in different settings. Table 4
summarises the mappings falling into this category.
ImageNet itself is not free from errors, since it is biased towards certain skin colour, gender,
or age. This is a great concern for ethical artificial intelligence scientists since models trained
on ImageNet are ubiquitous. There are some ongoing efforts to fix it with a more balanced
set of images, though [19]. Beyer et al. [2] enlisted numerous problems with ImageNet, such
as single pair per image, restrictive annotation process, or practically duplicate classes. They
proposed a set of new, more realistic labels (ReaL) and argued that models trained in such a
setting achieve better performance. Even given these drawbacks, ImageNet is still ubiquitous.
Naturally, the presented mapping inherits problems presented in ImageNet, such as these in
which images roughly do not present what the synset name suggests. This problem was pre-
viously reported by Nielsen [12] – he described it as a discrepancy between ImageNet and
WordNet. As for some examples, this might include radiator, which in ImageNet represents
home radiator, whereas the definition on Wikidata for the same name describes a bit more
broad notion (for instance, it also includes car radiators). Monitor is a similar example since it
might be any display device, though in ImageNet it is connected mostly to a computer display.
Sunscreen, sunblock, sun blocker represent different photos of products and their appliance on
the human body, which look completely different and might be split into two distinct classes.
Table 3. Mapping – ambiguities.
WordNet 3.0 synset Wikidata Entity
n01694178 (African chameleon, Chamaeleo
chamaeleon)
Q810152 (Chamaeleo africanus)
n02669723 (academic gown, academic robe, judge’s
robe)
Q1349227 (academic dress)
n02894605 (breakwater, groin, groyne, mole, bul-
wark, seawall, jetty)
Q215635 (breakwater)
n01755581 (diamondback, diamondback rattlesnake,
Crotalus adamanteus)
Q744532 (eastern diamondback rat-
tlesnake)
n03110669 (cornet, horn, trumpet, trump)Q202027 (cornet )
n03899768 (patio, terrace)Q737988 (patio)
n04258138 (solar dish, solar collector, solar furnace)Q837515 (solar collector)
n03016953 (chiffonier, commode)Q2746233 (chiffonier)
n02114548 (white wolf, Arctic wolf, Canis lupus tun-
drarum)
Q216441 (Arctic wolf )
n13133613 (Ear, spike, capitulum)Q587369 (Pseudanthium)
n04509417 (unicycle, monocycle)Q223924 (unicycle)
n01729322 (hognose snake, puff adder, sand viper)Q5877356 (hognose)
n01735189 (garter snake, grass snake)Q1149509 (garter snake)
n02017213 (European gallinule, Porphyrio porphyrio)Q187902 (Porphyrio porphyrio)
n02013706 (limpkin, aramus pictus)Q725276 (limpkin)
n04008634 (projectile, missile)Q49393 (projectile)
n09399592 (promontory, headland, head, foreland)Q1245089 (promontory )
n01644900 (tailed frog, bell toad, ribbed toad, tailed
toad, Ascaphus trui)
Q2925426 (tailed frog)
n02395406 (hog, pig, grunter, squealer, Sus scrofa)Q787 (pig)
n02443114 (polecat, fitch, foulmart, foumart, Mustela
putorius)
Q26582 (Mustela putorius)
n03017168 (chime, bell, gong)Q101401 (bell )
n02088466 (bloodhound, sleuthhound)Q21098 (bloodhound)
n03595614 (jersey, t-shirt)Q131151 (t-shirt )
n03065424 (coil, spiral, volute, whorl, helix)Q189114 (spiral )
n03594945 (jeep, land rover)Q946596 (off-road vehicle)
n01753488 (horned viper, cerastes, sand viper,
horned asp, Cerastes cornutus)
Q1476343 (Cerastes cerastes)
n03496892 (harvester, reaper)Q1367947 (reaper )
n02869837 (bonnet, poke bonnet)Q1149531 (bonnet)
Table 4. Mapping – non-exact matches.
WordNet 3.0 synset Wikidata Entity
n04311004 (steel arch bridge)Q5592057 (through arch bridge)
n01737021 (water snake)Q2163958 (common water snake)
n07714571 (head cabbage)Q35051 (white cabbage)
n01871265 (tusker)Q7378 (elephant)
n04344873 (studio coach, day bed)Q19953097 (sofa bed )
n07714571 (head cabbage)Q35051 ((white) cabbage)
n03961711 (plate rack)Q1469010 (dish drying cabinet)
n04505470 (typewriter keyboard)Q46335 (typewriter )
n03825788 (nipple)Q797906 (baby bottle)
n04435653 (tile roof )Q268547 (roof tile)
n09835506 (baseball player)Q7217606 (Category:baseball players)
n02966687 (carpenter’s kit, tool kit)Q1501161 (toolbox)
n02860847 (bobsleigh – sleigh) Q177275 (bobsleigh – sport)
n04493381 (tub, vat)Q152095 (bathtub)
n03443371 (goblet)Q14920412 (stemware)
n02978881 (cassette)Q149757 (audio cassette)
Analytics
We also check to what extent the process can be automated, as it might be useful for larger
subsets of ImageNet (in a broad sense). In this section, we present the results of such an
investigation. We also provide a concise analysis of the number of direct properties, which is a
crucial feature in spite of the future usage of the mapping in various computer vision settings.
Foppiano and Romary developed entity-fishing [7], a tool for named entity recognition and
disambiguation (abbreviated as NERD). This tool can be employed in order to provide an auto-
matic mapping between ImageNet and Wikidata. We used indexed data built from the Wikidata
and Wikipedia dumps from 20.05.2020. For this experiment, each synset is split on commas.
For example, a synset barn spider, Araneus cavaticus (n01773549) is split into two synset el-
ements: barn spider and Araneus cavaticus. For each of these elements, the term lookup
service from entity-fishing is called, which searches the knowledge base for given terms in
order to provide match candidates. Since this service provides a list of entities ranked by its
conditional probability, we choose the one with the highest value.
We start with the 461 already linked instances, which can be perceived as ground-truth data
for this experiment. Among them, for 387 (84%) synset elements there was at least one correct
suggestion (for example, for a previously mentioned synset barn spider and Araneus cavati-
cus at least one was matched to Q1306991). In particular, 286 (62%) synsets were correctly
matched for all its elements (for example, for a previously mentioned synset barn spider and
Araneus cavaticus were both matched to Q1306991). While these results show that NERD tools
can speed up the process of linking by narrowing down the number of entities to be searched
for in some cases, it does not replace manual mapping completely – especially in more com-
plicated and ambiguous cases, which were mentioned in the previous section. Nevertheless,
for the remaining 539 synsets which were manually linked, an identical NERD experiment has
been performed, which resulted in similar figures. For 448 (83%) synsets, entity-fishing pro-
vided the same match for at least one synset element. Similarly, for 342 synsets (63%) the tool
yielded the same match for all elements. Albeit these figures can be considered as relatively
not low, they prove that the mapping obtained in such a way might consider some discrepancies
and justify the process presented in the previous section.
Table 5. Most popular properties in the mapping (occurrences of the same properties of a given
entity was counted as one).
property label count
P646 (Freebase ID) 932
P373 (Commons category ) 927
P18 (image) 911
P8408 (KBpedia ID) 687
P910 (topic’s main category) 681
P279 (subclass of ) 659
P1417 (Encyclopædia Britannica Online ID) 618
P8814 (WordNet 3.1 Synset ID) 551
P31 (instance of ) 524
P1014 (Art & Architecture Thesaurus ID) 482
Similarly to Nielsen, we also count the number of the direct properties available in Wikidata.
This is a crucial feature since it enables to leverage knowledge graph structure. Listing 2shows
the query used for obtaining the number of properties for Q29022. The query was repeated for
each mapped entity. Figure 1depicts a histogram of direct properties for the 1000 mapped
classes. This histogram presents the right-skewed distribution (normal-like after taking the
natural logarithm) with the mean at 28.28 (σ= 22.77). Only one entity has zero properties (wall
clock).
In total, there are 992 Wikidata entities used in the mapping, as some of them were used
several times, like the mentioned poodle. These entities displayed 626 unique properties in to-
tal. The most popular ones are listed in Table 5. In the optics of computer vision, an important
subset of these categories includes P373 (Commons category )P910 (topic’s main category),
and P279 (subclass of), as they directly reflect hyponymy and hypernymy with the knowledge
graph. Such information can be later leveraged in the process of detecting (dis-)similar nodes in
a direct way. For example, using e.g. graph path distance in SPARQL for entities sharing com-
mon ancestors considering a given property. However, SPARQL does not allow to count the
number of arbitrary properties between two given entities. Using graph embedding is a potential
workaround for this issue. For example, one can calculate the distances from 200-dimensional
pre-trained embeddings provided by the PyTorch-BigGraph library [10]. Another possible direc-
tion considers leveraging other linked knowledge graphs, such as Freebase (P646), which is
linked to the majority of considered instances.
1SELECT (COUNT (?property)AS ?count)
2WHERE {
3wd:Q29022 ?property [] .
4FILTER STRSTARTS(STR(?property),
"http://www.wikidata.ord/prop/direct/"),
5}
Listing 2. Counting direct properties for a single mapped entity. Based on: Nielsen [12].
Summary
In this paper, we presented a complete mapping of ILSVRC 2012 synsets and Wikidata. For
461 classes, such a mapping already existed in Wikidata with skos:exactMatch. For other 467
0 25 50 75 100 125 150 175
Number of properties
0
20
40
60
80
100
120
140
160
Count
Figure 1. A histogram of direct properties.
classes, we found candidates, which match their corresponding synset. Since 72 classes do not
have a direct match, we proposed a detailed justification of our choices. We also compared our
mapping with the one obtained from an automated process. To the best of our knowledge, we
are the first to publish the mapping ImageNet and Wikidata. The mapping is publicly available
for use and validation in various computer vision scenarios.
Future work should focus on empirically testing the mapping. Our results are intended to
be beneficial for general-purpose computer vision research since the graphs can be leveraged
as a source of contextual information for various tasks, as our analysis showed that the vast
majority of the linked entities have a certain number of direct properties. This fact can be
utilised according to the given computer vision task. For example, it may be used to generate
low-level entity (label) embeddings and calculate distances between them in order to create a
correlation matrix used in Knowledge Graph Transfer Network [4] in the task of few-shot image
classification. This architecture leverages prior knowledge regarding the semantic similarity
of considered labels (called correlation matrix in the paper), which are used for creating class
prototypes. These prototypes are used to help the classifier learn novel categories with only few
samples available. Correlations might be calculated using simple graph path distance, as well
as using more sophisticated low-dimensional knowledge graph embeddings and some distance
metrics between each instance. In this case, this will result in a 1000×1000 matrix, as there
are 1000 labels in ImageNet. Embeddings from pre-trained models might be used for this task
(such as the aforementioned PyTorch-BigGraph embeddings).
Future work might also consider extending the mapping in a way that allows considering
larger subsets of ImageNet (in a broad sense), such as ImageNet-6K [4], the dataset, which
consists of 6000 ImageNet categories. Preparation of such a large mapping might require a
more systematic and collaboratively-oriented approach, which can help to create, verify and
reuse the results [20]. The presented approach can also be used for providing mappings with
other knowledge graphs and ImageNet. Another possible application might consider further
mapping to the actions, which might be particularly interesting for applications in robotics, where
robots would be deciding which actions to take based on such mappings [14].
CRediT – Contributor Roles Taxonomy
Dominik Filipiak: conceptualization, data curation, formal analysis, investigation, methodol-
ogy, software, validation, writing – original draft. Anna Fensel: conceptualization, funding
acquisition, project administration, writing – review & editing, validation, resources. Agata Fil-
ipowska: writing – review & editing, validation, resources.
Acknowledgements
This research was co-funded by Interreg ¨
Osterreich-Bayern 2014-2020 programme project KI-
Net: Bausteine f ¨
ur KI-basierte Optimierungen in der industriellen Fertigung (grant agreement:
AB 292).
References
[1] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A nucleus
for a web of open data. In The semantic web, pages 722–735. Springer, 2007.
[2] L. Beyer, O. J. H´
enaff, A. Kolesnikov, X. Zhai, and A. v. d. Oord. Are we done with ima-
genet? arXiv preprint arXiv:2006.07159, 2020.
[3] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko. Translating em-
beddings for modeling multi-relational data. Advances in neural information processing
systems, 26:2787–2795, 2013.
[4] R. Chen, T. Chen, X. Hui, H. Wu, G. Li, and L. Lin. Knowledge graph transfer network for
few-shot recognition. In AAAI, pages 10575–10582, 2020.
[5] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale
hierarchical image database. In 2009 IEEE conference on computer vision and pattern
recognition, pages 248–255. Ieee, 2009.
[6] C. Edwards. Linking knowledge graphs and images using embeddings. https://
cnedwards.com/files/studyabroad_report.pdf, 2018.
[7] L. Foppiano and L. Romary. entity-fishing: a dariah entity recognition and disambiguation
service. Journal of the Japanese Association for Digital Humanities, 5(1):22–60, 2020.
[8] M. Huh, P. Agrawal, and A. A. Efros. What makes imagenet good for transfer learning?
arXiv preprint arXiv:1608.08614, 2016.
[9] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolu-
tional neural networks. Communications of the ACM, 60(6):84–90, 2017.
[10] A. Lerer, L. Wu, J. Shen, T. Lacroix, L. Wehrstedt, A. Bose, and A. Peysakhovich. PyTorch-
BigGraph: A Large-scale Graph Embedding System. In Proceedings of the 2nd SysML
Conference, Palo Alto, CA, USA, 2019.
[11] G. A. Miller. Wordnet: a lexical database for english. Communications of the ACM,
38(11):39–41, 1995.
[12] F. ˚
A. Nielsen. Linking imagenet wordnet synsets with wikidata. In Companion Proceedings
of the The Web Conference 2018, pages 1809–1814, 2018.
[13] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy,
A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. Interna-
tional journal of computer vision, 115(3):211–252, 2015.
[14] I. Stavrakantonakis, A. Fensel, and D. Fensel. Matching web entities with potential actions.
In SEMANTICS (Posters & Demos), pages 35–38. Citeseer, 2014.
[15] D. Vrandeˇ
ci´
c and M. Kr¨
otzsch. Wikidata: a free collaborative knowledgebase. Communi-
cations of the ACM, 57(10):78–85, 2014.
[16] W. Wang, V. W. Zheng, H. Yu, and C. Miao. A survey of zero-shot learning: Settings,
methods, and applications. ACM Transactions on Intelligent Systems and Technology
(TIST), 10(2):1–37, 2019.
[17] Y. Wang, Q. Yao, J. T. Kwok, and L. M. Ni. Generalizing from a few examples: A survey on
few-shot learning. ACM Computing Surveys (CSUR), 53(3):1–34, 2020.
[18] H.-M. Yang, X.-Y. Zhang, F. Yin, and C.-L. Liu. Robust classification with convolutional pro-
totype learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pages 3474–3482, 2018.
[19] K. Yang, K. Qinami, L. Fei-Fei, J. Deng, and O. Russakovsky. Towards fairer datasets:
Filtering and balancing the distribution of the people subtree in the imagenet hierarchy.
In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency,
pages 547–558, 2020.
[20] A. V. Zhdanova and P. Shvaiko. Community-driven ontology matching. In European Se-
mantic Web Conference, pages 34–49. Springer, 2006.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Machine learning has been highly successful in data-intensive applications but is often hampered when the data set is small. Recently, Few-shot Learning (FSL) is proposed to tackle this problem. Using prior knowledge, FSL can rapidly generalize to new tasks containing only a few samples with supervised information. In this article, we conduct a thorough survey to fully understand FSL. Starting from a formal definition of FSL, we distinguish FSL from several relevant machine learning problems. We then point out that the core issue in FSL is that the empirical risk minimizer is unreliable. Based on how prior knowledge can be used to handle this core issue, we categorize FSL methods from three perspectives: (i) data, which uses prior knowledge to augment the supervised experience; (ii) model, which uses prior knowledge to reduce the size of the hypothesis space; and (iii) algorithm, which uses prior knowledge to alter the search for the best hypothesis in the given hypothesis space. With this taxonomy, we review and discuss the pros and cons of each category. Promising directions, in the aspects of the FSL problem setups, techniques, applications, and theories, are also proposed to provide insights for future research.
Conference Paper
Full-text available
The creation of schema.org as the de facto vocabulary for the implementation of Semantic Annotations was the dawn of a new era for the Web by motivating the Web developers to start weaving semantics in the content, mainly, for visibility reasons in search engine results. Moving further, the new version of the vocabulary enables Web entities to self-describe the Actions with which they interact with users, agents or services. In this scope, we present our ongoing work on automatic weaving of Actions based on the existing semantic annotations of a website.
Article
This paper presents an attempt to provide a generic named-entity recognition and disambiguation module (NERD) called entity-fishing as a stable online service that demonstrates the possible delivery of sustainable technical services within DARIAH, the European digital research infrastructure for the arts and humanities. Deployed as part of the national infrastructure Huma-Num in France, this service provides an efficient state-of-the-art implementation coupled with standardised interfaces allowing an easy deployment on a variety of potential digital humanities contexts. Initially developed in the context of the FP9 EU project CENDARI, the software was well received by the user community and continued to be further developed within the H2020 HIRMEOS project where several open access publishers have integrated the service to their collections of published monographs as a means to enhance retrieval and access. entity-fishing implements entity extraction as well as disambiguation against Wikipedia and Wikidata entries. The service is accessible through a REST API which allows easier and seamless integration, language independent and stable convention and a widely used service-oriented architecture (SOA) design. Input and output data are carried out over a query data model with a defined structure providing flexibility to support the processing of partially annotated text or the repartition of text over several queries. The interface implements a variety of functionalities, like language recognition, sentence segmentation and modules for accessing and looking up concepts in the knowledge base. The API itself integrates more advanced contextual parametrisation or ranked outputs, allowing for the resilient integration in various possible use cases. The entity-fishing API has been used as a concrete use case to draft the experimental stand-off proposal, which has been submitted for integration into the TEI guidelines. The representation is also compliant with the Web Annotation Data Model (WADM). In this paper we aim at describing the functionalities of the service as a reference contribution to the subject of web-based NERD services. In this paper, we detail the workflow from input to output and unpack each building box in the processing flow. Besides, with a more academic approach, we provide a transversal schema of the different components taking into account non-functional requirements in order to facilitate the discovery of bottlenecks, hotspots and weaknesses. We also describe the underlying knowledge base, which is set up on the basis of Wikipedia and Wikidata content. We conclude the paper by presenting our solution for the service deployment: how and which the resources where allocated. The service has been in production since Q3 of 2017, and extensively used by the H2020 HIRMEOS partners during the integration with the publishing platforms.
Article
Few-shot learning aims to learn novel categories from very few samples given some base categories with sufficient training samples. The main challenge of this task is the novel categories are prone to dominated by color, texture, shape of the object or background context (namely specificity), which are distinct for the given few training samples but not common for the corresponding categories (see Figure 1). Fortunately, we find that transferring information of the correlated based categories can help learn the novel concepts and thus avoid the novel concept being dominated by the specificity. Besides, incorporating semantic correlations among different categories can effectively regularize this information transfer. In this work, we represent the semantic correlations in the form of structured knowledge graph and integrate this graph into deep neural networks to promote few-shot learning by a novel Knowledge Graph Transfer Network (KGTN). Specifically, by initializing each node with the classifier weight of the corresponding category, a propagation mechanism is learned to adaptively propagate node message through the graph to explore node interaction and transfer classifier information of the base categories to those of the novel ones. Extensive experiments on the ImageNet dataset show significant performance improvement compared with current leading competitors. Furthermore, we construct an ImageNet-6K dataset that covers larger scale categories, i.e, 6,000 categories, and experiments on this dataset further demonstrate the effectiveness of our proposed model.
Article
Most machine-learning methods focus on classifying instances whose classes have already been seen in training. In practice, many applications require classifying instances whose classes have not been seen previously. Zero-shot learning is a powerful and promising learning paradigm, in which the classes covered by training instances and the classes we aim to classify are disjoint. In this paper, we provide a comprehensive survey of zero-shot learning. First of all, we provide an overview of zero-shot learning. According to the data utilized in model optimization, we classify zero-shot learning into three learning settings. Second, we describe different semantic spaces adopted in existing zero-shot learning works. Third, we categorize existing zero-shot learning methods and introduce representative methods under each category. Fourth, we discuss different applications of zero-shot learning. Finally, we highlight promising future research directions of zero-shot learning.
Conference Paper
The linkage of ImageNet WordNet synsets to Wikidata items will leverage deep learning algorithm with access to a rich multilingual knowledge graph. Here I will describe our on-going efforts in linking the two resources and issues faced in matching the Wikidata and WordNet knowledge graphs. I show an example on how the linkage can be used in a deep learning setting with real-time image classification and labeling in a non-English language and discuss what opportunities lies ahead.
Article
Wikidata allows every user to extend and edit the stored information, even without creating an account. A form based interface makes editing easy. Wikidata's goal is to allow data to be used both in Wikipedia and in external applications. Data is exported through Web services in several formats, including JavaScript Object Notation, or JSON, and Resource Description Framework, or RDF. Data is published under legal terms that allow the widest possible reuse. The value of Wikipedia's data has long been obvious, with many efforts to use it. The Wikidata approach is to crowdsource data acquisition, allowing a global community to edit the data. This extends the traditional wiki approach of allowing users to edit a website. In March 2013, Wikimedia introduced Lua as a scripting language for automatically creating and enriching parts of articles. Lua scripts can access Wikidata, allowing Wikipedia editors to retrieve, process, and display data. Many other features were introduced in 2013, and development is planned to continue for the foreseeable future.
Article
We consider the problem of embedding entities and relationships of multi relational data in low-dimensional vector spaces. Our objective is to propose a canonical model which is easy to train, contains a reduced number of parameters and can scale up to very large databases. Hence, we propose TransE, a method which models relationships by interpreting them as translations operating on the low-dimensional embeddings of the entities. Despite its simplicity, this assumption proves to be powerful since extensive experiments show that TransE significantly outperforms state-of-the-art methods in link prediction on two knowledge bases. Besides, it can be successfully trained on a large scale data set with 1M entities, 25k relationships and more than 17M training samples.