Conference PaperPDF Available

Towards an ontology based on Hallig-Wartburg's Begriffssystem for Historical Linguistic Linked Data

Authors:

Abstract and Figures

To empower end users in searching for historical linguistic content with a performance that far exceeds the research functions offered by websites of, e.g., historical dictionaries, is undoubtedly a major advantage of (Linguistic) Linked Open Data ([L]LOD). An important aim of lexicography is to enable a language-independent, onomasiological approach, and the modelling of linguistic resources following the LOD paradigm facilitates the semantic mapping to ontologies making this approach possible. Hallig-Wartburg's Begriffssystem (HW) is a well-known extra-linguistic conceptual system used as an onomasiological framework by many historical lexicographical and lexicological works. Published in 1952, HW has meanwhile been digitised. With proprietary XML data as the starting point, our goal is the transformation of HW into Linked Open Data in order to facilitate its use by linguistic resources modelled as LOD. In this paper, we describe the particularities of the HW conceptual model and the method of converting HW: We discuss two approaches, (i) the representation of HW in RDF using SKOS, the SKOS thesaurus extension, and XKOS, and (ii) the creation of a lightweight ontology expressed in OWL, based on the RDF/SKOS model. The outcome is illustrated with use cases of medieval Gascon, and Italian.
Content may be subject to copyright.
Towards an ontology based on Hallig-Wartburg’s Begriffssystem for Historical
Linguistic Linked Data
Sabine Tittel, Frances Gillis-Webber, Alessandro A. Nannini
Heidelberg Academy of Sciences and Humanities, University of Cape Town, University of Vienna
Heidelberg, Germany, Cape Town, South Africa, Vienna, Austria
sabine.tittel@urz.uni-heidelberg.de, fran@fynbosch.com, alessandro.alfredo.nannini@univie.ac.at
Abstract
To empower end users in searching for historical linguistic content with a performance that far exceeds the research functions offered by
websites of, e.g., historical dictionaries, is undoubtedly a major advantage of (Linguistic) Linked Open Data ([L]LOD). An important
aim of lexicography is to enable a language-independent, onomasiological approach, and the modelling of linguistic resources following
the LOD paradigm facilitates the semantic mapping to ontologies making this approach possible. Hallig-Wartburg’s Begriffssystem
(HW) is a well-known extra-linguistic conceptual system used as an onomasiological framework by many historical lexicographical
and lexicological works. Published in 1952, HW has meanwhile been digitised. With proprietary XML data as the starting point, our
goal is the transformation of HW into Linked Open Data in order to facilitate its use by linguistic resources modelled as LOD. In this
paper, we describe the particularities of the HW conceptual model and the method of converting HW: We discuss two approaches, (i) the
representation of HW in RDF using SKOS, the SKOS thesaurus extension, and XKOS, and (ii) the creation of a lightweight ontology
expressed in OWL, based on the RDF/SKOS model. The outcome is illustrated with use cases of medieval Gascon, and Italian.
Keywords: Historical Linguistics, Linked Open Data, Ontology Authoring
1. Introduction
As the most solid grounding of the Semantic Web, the
Linked Data (LD) paradigm is used to represent and inter-
link structured data on the web. The standard proposed by
the W3C for representing LD (LOD respectively, with ‘O’
symbolising open access) is the graph data model Resource
Description Framework (RDF) that represents data in the
form of triples with subject, predicate, and object, each
identified through URIs that are accessible via HTTP (Cy-
ganiak et al., 2014). There are many advantages to repre-
senting linguistic resources in RDF, and applying LD prin-
ciples to them, such as structural and conceptual interop-
erability, uniform access through standard Web protocols,
and resource integration and federation (Chiarcos et al.,
2013). Representing dictionary data as Linguistic Linked
Open Data (LLOD) is a very promising approach, espe-
cially as it allows for interoperability among different lexi-
cographic resources through the use of common vocabular-
ies that have emerged for the modelling of linguistic data.
The OntoLex-lemon vocabulary (Cimiano et al., 2016) has
been established as the de facto standard RDF data model
for LLOD; it provides the framework for the representation
of language data such as lexical entries, their written rep-
resentations, and their meanings. The data modelled with
OntoLex-lemon can easily be integrated by linking to exter-
nal resources, such as ontologies for linguistic annotations
(e.g., LexInfo1), and extra-linguistic information, such as
place names (e.g., TGN2). We point out that the typical
scenario of (historical) linguistic research is characterised
by poor data accessibility through searching for words and
their formal representations across resources of different
languages and language stages. This scenario hampers se-
mantic driven research of the meanings of the words, par-
1https://lexinfo.net/ [12-02-2020].
2https://www.getty.edu/research/tools/
vocabularies/tgn/index.html [12-02-2020].
ticularly for historical language data with non-standardised
word spelling. To facilitate access independent from the
words and their formal representations, the data modelling
must, hence, also be enriched by semantic mapping (of en-
tries, senses, concepts) to appropriate ontologies that de-
pict the ‘real world’ (DBpedia3, AGROVOC4, AAT5, etc.).
The use of an external extra-linguistic ontology as a cross-
mapping hub for linguistic resources, especially for histor-
ical resources, is able to overcome the typical, word-form
driven research scenario. This is facilitated by OntoLex-
lemon and its “principle of semantics by reference in the
sense that the semantics of a lexical entry is expressed by
reference to an individual, class or property defined in an
ontology” (Cimiano et al., 2016, 2.1). One such ontology—
in the philosophical meaning of the term—is the so-called
Hallig-Wartburg (HW), first published in 1952 (21963): Be-
griffssystem als Grundlage f¨
ur die Lexikographie (Hallig
and von Wartburg, 1963). In this paper, we focus on the use
of HW by linguistic resources and on its transition from a
printed book to an LOD resource in order to facilitate its
use by linguistic resources on the Semantic Web.
The remainder of the paper is structured as follows: In
section 2., we describe the role of HW for linguistic re-
sources of historical language stages that have been or in-
tend to be modelled as LOD. In section 3., we discuss an
attempt to convert HW from the original book, via an XML
digitisation, into an LOD resource that can be used for se-
mantic mapping. In light of the requirements of the LOD
paradigm, we first evaluate a thesaurus-like RDF/SKOS
model in section 3.1.; in section 3.2., we discuss its further
conversion to an ontological model, and we show its practi-
cal application with the use case of data from two historical
3https://wiki.dbpedia.org/ [12-02-2020].
4http://agrovoc.uniroma2.it/agrovoc/
agrovoc/en/ [13-02-2020].
5https://www.getty.edu/research/tools/
vocabularies/aat/ [12-02-2020].
dictionaries, DAG and LEI, in section 4. Our approach re-
veals difficulties and shortcomings both with respect to a
re-engineering of the ontological model and to the concep-
tual scheme of HW itself, which we discuss in section 5.
2. Onomasiological Lexicography and the
use of Hallig-Wartburg’s Begriffssystem
Traditional lexicography either follows a semasiological
approach in presenting dictionary data, i.e., the data is or-
dered by the words, or an onomasiological approach, i.e.,
the data is ordered by the meaning of the words. For an
onomasiological approach, a thesaurus-like categorisation
of the world is needed as a structuring means. Resources re-
ferred to as thesauri include the Historical Thesaurus of the
Oxford English Dictionary (HTOED) (Kay, 2009), Roget’s
Thesaurus of English words and phrases (first edition Lon-
don 1852, Davidson (2002)), and Dornseiff’s Der deutsche
Wortschatz nach Sachgruppen (Dornseiff, 1934). Possibly
the best-known example of a thesaurus-like categorisation
of the world used within Romance philology and the refer-
ence work of the discipline is Hallig-Wartburg.
2.1. Structure of Hallig-Wartburg
Hallig’s and Wartburg’s Begriffssystem—German for ‘sys-
tem of concepts’—is a conceptual scheme in that it is a
controlled vocabulary with a hierarchically structured set
of concepts. At first glance, it seems to be a thesaurus-
like resource. However, ISO 25964 defines a thesaurus as
a “controlled and structured vocabulary in which concepts
are represented by terms, organized so that relationships
between concepts are made explicit, and preferred terms
are accompanied by lead-in entries for synonyms or quasi-
synonyms”, a term being a “word or phrase used to label
a concept” and a concept being a “unit of thought” (In-
ternational Organization for Standardization, 2011). The
terms come from the vocabulary of one or several natu-
ral language(s) meaning that they are lexicalised in that
language and typically expressed with equivalence rela-
tionships (synonyms, quasi-synonyms or antonyms) in the
thesaurus (Kless et al., 2012a; Kless et al., 2012b); cp.
also Helou et al. (2014) on ontology entities expressed
in natural language by associating them with terms. The
lexicalisation of the labelling terms is the decisive factor
for the classification of HW as not compliant with ISO
25964. HW does not provide lexicalised terms in a nat-
ural language. HW, unlike thesauri such as HTOED, Ro-
get, and also the thesaurus-like, lexical database WordNet
(Fellbaum, 1998), does not spring from a list of words of
a natural language (the ‘terms’), e.g., of a (semasiolog-
ically structured) dictionary, word list or similar source.
Instead, it is meant to be a resource for the use of, e.g.,
onomasiologically structured dictionaries: It is an extra-
linguistic reference system of the real world reflecting the
model of thought of a ‘talented average person’ (HW 12),
independent from language and with an a priori charac-
ter (“ein empirisches, aus sprachlichen Allgemeinbegrif-
fen bestehendes, [. . . ] auf ph¨
anomenologischer Grundlage
beruhenden Gliederungsprinzipien gestaltetes außersprach-
liches Bezugssystem”, ib. 21). HW contains approx. 1675
non-lexicalised concepts ordered in a nine-level hierarchy.
It is clear that a concept must be communicated by a sign,
and, indeed, the HW concepts are denoted by words of the
French language. However, these words are only vehicles
and, thus, arbitrary: HW makes it explicit that the words,
e.g., ‘La mer’, are mere symbols of the concepts and not
to be misunderstood as lexemes of the French lexicon (ib.
16; 72). This can be illustrated by, e.g., p´
eriodique (peri-
odical) and quotidien (daily) that are both sub-concepts of
the concept of fois (time [occasion]), not of ‘period’ and
‘day’, respectively (ib. 17). As a consequence, concepts
may occur several times (with cross-references), e.g., ‘fish-
ing’ both as an occupation and a sport (ib. 73). The authors
of HW were aware of possible misunderstandings and point
out that a particular identification of the emblematic char-
acter of the French words, e.g., through square brackets,
would have been useful but that they refrained from this for
the sake of readability (ib.).
The concepts of the upper six levels of the hierarchy
are denoted by French non-lexicalised categories, e.g.,
‘L’univers’, ‘Le ciel et l’atmosph`
ere’, and ‘Le ciel et les
corps c´
elestes’, and, additionally, the concepts are identi-
fied by a system of capital letters between A and C, fol-
lowed by Roman numerals, Arabic lower case letters, etc.:
A’, ‘A I’, ‘B II h’, etc. This six-level hierarchy forms
the ‘Plan’ with 524 concepts, the outline with the logi-
cal abstraction of concepts representing broader, concep-
tual fields, cf. HW 101–112 (Fig. 1).
Figure 1: The ‘Plan’ (extract), HW 103.
In HW 113–229, the six conceptual levels of the ‘Plan’ are
then further extended by another, up to three-level hierarchy
of approx. 1,150 finer-grained concepts for “lexicography
proper as represented by the ‘words’ classified in its appli-
cation” (Orr, cited by HW 20, footnote 4), which we will
refer to as ‘Application’ in the following (Fig. 2). These
concepts are not consecutively numbered.
Figure 2: Finer-grained ‘Application’ (extract), HW 141.
Thesauri (and this applies to a conceptual scheme such as
HW as well) establish hierarchical relationships and as-
sociative relationships between concepts. The hierarchi-
cal relationships can be generic, a whole-part relation, and
a concept-instance relation; the associative relationships
exist between hierarchically unrelated but semantically or
conceptually related concepts (Kless et al., 2012a, 135f.).
HW contains hierarchical (both generic and whole-part re-
lations) and also associative relationships between the con-
cepts (HW 18); neither cyclic hierarchical relationships nor
orphans. HW prioritises the hierarchical over the associa-
tive classification but deliberately prefers the latter in cases
where an association seems more ‘natural’ (ib.), particu-
larly in fields where the concepts are closely connected to
specialised domains, such as house building and hunting.
With this approach to classification, HW wants to take ac-
count of the fact that every language has its own peculiar
interpenetration of systematics and non-systematics, which
is reflected in the linguistic interpretation of the world (ib.)
E.g., the concept ‘construire’ (to construct) is neither hier-
archically allocated to ‘L’action’ (B II h 3) [together with
‘faire’ (to make) and ‘cr´
eer’ (to create)], nor to ‘L’espace’
(space, C I e) [together with ‘assembler’ (to assemble)].
Instead, it is associated to the concept of house building,
i.e., ‘La construction’ (B III b 7 bb, sub ‘L’habitation, la
maison’). The concept ‘miette’ (crumb) is logically a sub-
concept of ‘morceau’ (part, sub-concept of C I d ‘Le nom-
bre et la quantit´
e’) but associated to the concept ‘Le pain, la
pˆ
atisserie’ (bread, patisserie, B I k 1 cc 2 ), and ‘saumure’
(brine) is a concept associated to ‘La viande’ (meat, B I k
1 cc 1). An example for a hierarchical, whole-part relation
is the relation of the concept ‘les narines’ (nostrils) to its
superordinate concept ‘Le corps et les membres’ (the body
and its parts).
The concepts and their classification reveal problematic
congruencies, wrong hierarchisation, and inconsistencies6:
1. On levels 1-6, we find the identical concept
‘G´
en´
eralit´
es’ 27 times, semantically disambiguated
through its place in the hierarchy, e.g., as a sub-
concept of ‘Les arbres’; these concepts can be sup-
pressed since one could simply refer to the respective
superordinate concept. On levels 7-9, ‘esp.’ (abbrevi-
ating esp`
eces, sub-species, e.g., of the apple) occurs.
2. On levels 8 and 9, we find the string ‘etc. as a concept
denomination.
3. On levels 7-9, some concepts are followed by refer-
ences to homonymic concept denominations (printed
in italics, separated by a comma), e.g., ‘port, v. aussi
p. 197a’.
4. On levels 7-9, some concept denominations are speci-
fied through German definitions. In some cases, this
aims at the semantic disambiguation of homonymic
concept denominations within the same superordinate
concept, e.g., ‘beau-p`
ere “Schwiegervater” ’ (father-
in-law) / ‘beau-p`
ere “Stiefvater” ’ (stepfather).
5. C II a 17 ‘La phon´
etique’ is on the same hierarchy
level as C II a 18 ‘La linguistique’ but should be a
sub-concept of the latter.
6Naturally, concepts that reflect the zeitgeist of the time of
HW’s creation, e.g. ‘Les costumes nationaux et pittoresques’, are
to be found as well.
6. We find ‘alchimie’ falsely classified under A II e ‘Les
m´
etaux’ which is a sub-concept of the top concept A
‘L’Univers’. However, this top concept should contain
only sub-concepts related to organic and inorganic na-
ture, and not to human activities (HW 89).
7. Similarly, under A IV ‘Les animaux’ we find ‘Les
animaux fabuleux’ (fabulous beasts) and its sub-
concepts ‘ph´
enix’ (phoenix) and ‘dragon’ (dragon),
concepts that cannot be separated from human con-
ception and should, thus, rather be associated to B II e
‘L’imagination’.
8. A classification inconsistency is the presence of the
sub-concept ‘Le tabac’ (tobacco, B I k 1 dd) under
‘Les aliments’ (food, B I k 1), as if tobacco were food.
2.2. Lexicographical and Lexicological
Resources using Hallig-Wartburg
HW has been chosen by numerous lexicographical and lexi-
cological works as a means of semantic structure. The most
comprehensive Franz¨
osisches Etymologisches W¨
orterbuch
(FEW) (von Wartburg, since 1922) is a dictionary of the
Galloromance languages and dialects covering the period
from the middle ages until today, structured by the al-
phabetical order of the etyma of the treated word fami-
lies. The words of unknown or uncertain origin are treated
in vol. 21–23 where they are grouped onomasiologically,
ordered by the HW concepts. The HW concepts form
the structural backbone of the dictionaries Dictionnaire
onomasiologique de l’ancien occitan (DAO) (Baldinger,
1975 to 2005) and the Dictionnaire onomasiologique de
l’ancien gascon (DAG) (Baldinger, since 1975): both fol-
low HW to structure the editing and publishing of the dic-
tionary entries (Glessgen and Tittel, 2018, 805). Seman-
tic criteria are used in the Lessico Etimologico Italiano
(LEI) (Pfister, since 1979) to build the structure of very
complex articles, as in the FEW 21–23 (Tancke, 1997,
466); in these cases, the lexicographical sections are or-
dered by semantic categories (in Italian language) that
closely recall those of HW. Recently, the online edition
of the Dictionnaire de l’occitan m´
edi´
eval (DOM) (Stem-
pel, 1996 to 2013) started evaluating the introduction of
HW concepts to align the entries to those of DAG´
el.7The
Dictionnaire ´
etymologique de l’ancien franc¸ais (DEAF)
(Baldinger, since 1971) follows a semasiological approach
but inherits HW categories when it refers to entries of
FEW 21–23. The Mittelhochdeutsche Begriffsdatenbank
(MHDBDB8) creates an onomasiological database for Mid-
dle High German, building on HW (Hinkelmanns, 2019):
the HW categorisation has been further developed with the
application on the lexis of Middle High German Frauen-
dienst by Ulrich von Lichtenstein (1255) and of Lanzelet
by Ulrich von Zatzikhoven (after 1193) (Schmidt, 1980;
Schmidt, 1988; Schmidt, 1993). Also, many onomasiologi-
cally structured lexicological studies on medieval until 16th
century French, Italian, Spanish, Gascon and Occitan re-
sources (literary texts, architecture, Bible, etc.), use HW
concepts, e.g., Bevans (1941) on the Old French vocabu-
7Personal communication by Maria Selig, DOM.
8http://mhdbdb.sbg.ac.at/ [06-02-2020].
lary of Champagne9, Keller (1953) on the vocabulary used
by Wace (* approx. 1110 – † after 1174), de Man (1956) on
the Brabant language in archival sources 1300-1550, etc.
(Baldinger, 1959, 1091f.).
2.3. Hallig-Wartburg in Linked Open Data
resources
As a contribution to the emerging linguistic LOD cloud and
to expand the inadequately represented historical linguistic
resources, efforts to model these lexicographic resources
as Linked Data have been initiated: The FEW is currently
digitally available as bitmap images10 but a digitisation by
means of XML is underway (Renders, 2015), and Renders
(2019) announces a study on how to model etymological
data of the FEW as LOD. For the electronic version of the
LEI, LEI-Digitale (Prifti, 2019), the LEI editors carry out
feasibility studies on LOD modelling and semantic map-
ping to HW or to a taxonomy based on HW (Nannini, in
progress). Tittel and Chiarcos (2018) created a RDF data
model for the electronic version of the DEAF (DEAF´
el)
and Tittel (in progress) for DAG´
el, the electronic comple-
ment to the DAG (Glessgen, since 2014). The relaunch of
the MHDBDB (planned for 2020) will include an RDF ver-
sion of the data (Hinkelmanns, 2019).
3. From the Begriffssystem to an Ontology
The representation of HW in RDF, and SKOS or as an on-
tology, achieves compatibility with other Semantic Web
technologies and is thought to facilitate interoperability
across linguistic resources applying HW as their onomasi-
ological framework. This helps to establish the word-form-
and language-independent access to these resources: a piv-
otal motivation to model them as LOD and to include ref-
erences to the HW concepts. A potential reuse both of HW
and of the linguistic resources using HW is also thought to
be promoted by the fact that the HW RDF graph is easy
to be referenced by other bigger, more comprehensive and
more detailed LOD resources, independent from a natural
language. Also, recall one of the main principles of the LD
paradigm: to provide useful information (in RDF) that is
returned when navigating to a URI, i.e., provide derefer-
enceable URIs.
However, the native format of the HW is a book publica-
tion which, thus, needs to be converted into a format com-
pliant with the LOD paradigm. For the digital editing of
the DAG´
el, the 524 numbered concepts (the ‘Plan’, Fig. 1)
of HW (second edition 1963) have been digitised in 2014
using DAG’s dictionary writing system (Glessgen and Tit-
tel, 2018). The finer-grained approx. 1,150 concepts of the
Application’ (Fig. 2) were excluded from the digitisation
because the DAG´
el uses only the concepts of the ‘Plan’ as
its framework. As a first step towards an RDF graph based
on HW, we exported the data as XML from the DAG´
el’s
database. The XML structure is based on rows with a sin-
gle XML element field and one attribute with two possi-
9Draws on the Questionnaire, the dialectal recordings made by
Rudolf Hallig as a preparation for HW (Christmann and B¨
ockle,
1983, 398).
10https://apps.atilf.fr/lecteurFEW/ [accessed
05-02-2020].
ble contents, as shown in List. 1. Alas, it does not contain
information that can easily be exploited for a future hierar-
chical representation of the category levels, as visualised in
Fig. 3.
1<?xml version="1.0"?>
2<resultset
3xmlns:xsi="http://www.w3.org/2001/
4XMLSchema-instance">
5<row>
6<field name="identifier">B I k 1 cc 1</field>
7<field name="concept">La viande</field>
8</row>
9<row>
10 <field name="identifier">B I k 1 cc 2</field>
11 <field name="concept">Le pain, la pˆ
atisserie</field>
12 </row>
13 </resultset>
Listing 1: Extract of XML data.
Figure 3: Hallig-Wartburg concept hierarchy.
3.1. Hallig-Wartburg in RDF and SKOS
HW is represented in a standard format of a Knowledge Or-
ganisation System (KOS), a system to represent classifica-
tion schemes, thesauri, taxonomies and similar structures.
The W3C has defined the Simple Knowledge Organization
System (SKOS) which provides a data model and vocab-
ulary for expressing KOSs in RDF (Miles and Bechhofer,
2009). Two types of semantic relations are distinguished
by SKOS: hierarchical and associative. The hierarchical
relation is typically represented by the ‘narrower’ and the
‘broader’ property, an associative relation is indicated by
the use of ‘related’. However, the specific nature of con-
cept relations cannot be expressed. The ISO 25964 SKOS
extension (Miles and Brickley, 2004) distinguishes finer-
grained semantic relations between the concepts and aims
at providing better interoperability between SKOS and the
thesaurus standard. It is ideal to explicitly express hier-
archical generic and whole-part relationships through the
SKOS-Thes properties ‘broaderGeneric’ and ‘broaderPar-
titive’ respectively. The associative relation expressing a
partitive relationship between concepts can be expressed
through the more specific property ‘relatedPartOf’. Ex-
tended Knowledge Organization System (XKOS) was de-
veloped to extend SKOS for statistical classification, and
one of its features, comparable to the SKOS-thesaurus ex-
tension, is to refine SKOS’ semantic properties (Cyganiak
et al., 2017). XKOS is a public working draft of a poten-
tial specification and therefore we chose to use the SKOS-
thesaurus extension to express the semantic relations, al-
though the properties of the latter are still classified as ‘un-
stable’. Nevertheless, XKOS offers possibilities to define
the classification levels of a KOS which we deem valuable
for our approach. The representation of HW’s hierarchical
and associative relationships is thus straightforward. How-
ever, the respective relations are not explicitly expressed
in the original source, and a representation in SKOS must
comprise a manual assessment of the relations.
We converted the XML data into RDF and SKOS (includ-
ing extensions), applying the following rules:
1. Since the HW is concept-based according to ISO
25964, all HW concepts can be represented as SKOS
concepts.
2. To define the hierarchy levels and their respective
members, we include XKOS ‘ClassificationLevel’.
3. We define the three concepts of the top level,
A ‘L’univers’, B ‘L’homme’, and C ‘L’homme et
l’univers’, as top concepts of the concept scheme
(List. 2, l. 7).
4. We utilize the content of XML
<field name="concept"> as the concept
denomination: to emphasize the symbolic character
of the denomination by capitalising all characters,
eliminating French accents and replacing spaces,
punctuation marks, and apostrophes with an under-
score, e.g., L_HOMME_ET_L_UNIVERS.
5. We also utilize said content to add a SKOS
‘scopeNote’ providing information about the scope of
the concept. Aiming at removing possible ambiguity
or misunderstanding of the non-lexicalised informa-
tion (erroneously as ‘terms’) we deem a scope note
the accurate ‘translation’ of the information given in
HW.
6. In SKOS, preferred and alternative lexical labels can
be used for “generating or creating human-readable
representations of a knowledge organization system”
(Miles and Bechhofer, 2009); it is consistent with
SKOS to assign (multiple) alternative lexical label(s)
but no preferred lexical label to a resource. SKOS
does not specify whether a resource with none of the
two lexical labels is consistent with the SKOS data
model, however, it is said to be advised to include a
lexical label “in order to generate an optimum human-
readable display” (ib.). Considering this advice and
the de facto missing terms in HW that could naturally
become lexical labels, we propose to misuse the con-
cept denominations: We allocate an additional func-
tion to the French words used as arbitrary symbols
by Hallig and Wartburg interpreting them as ‘terms’
expressed through skos:altLabel, e.g., “Les be-
soins de l’ˆ
etre humain”. This design decision aims
to compensate for the missing terms but refrains from
declaring preferred labels.
7. For backwards compatibility, we preserve the consec-
utive numbers of the upper six levels as contained in
XML <field name="identifier">, using the
SKOS ‘notation’ property; we define the string lit-
eral by a particular HW specific identification scheme
<hwIdentificationScheme>.
8. We eliminate concepts denominated by ‘etc.’, assum-
ing that the linguistic resources using HW as a ref-
erence do not classify lexemes under a concept ‘etc.
(approved by the editorial team of the DAG´
el).
9. Hierarchical generic relations are expressed through
skos-thes:broaderGeneric, e.g., the relation
between ‘La viande’ and ‘Les aliments’ (List. 2,
l. 40), hierarchical whole-part relations through
skos-thes:broaderPartitive, e.g., the rela-
tion between ‘les narines’ and ‘Le corps et les mem-
bres’ (List. 2, l. 51), and associative relations through
skos-thes:relatedPartOf, e.g., the relation
between ‘miette’ and ‘Le pain, la pˆ
atisserie’ (List. 2,
l. 46). To enable navigation from the top concept level
down into the hierarchy, we include the SKOS ‘nar-
rower’ property (l. 27; 41).
10. We distinguish homonymic concepts within the
same superordinate concept, that are, thus, not
disambiguated by their respective, different super-
ordinate concepts, as follows: We add a number
to the concept denomination and preserve the
German definitions that are used for the seman-
tic disambiguation as a SKOS ‘editorialNote’,
e.g., sub B III a 1 aa 3 (‘La parent´
e’), ‘beau-
p`
ere’: :BEAU_PERE_1 skos:scopeNote
"beau-p`
ere"@fr skos:editorialNote
"Schwiegervater"@de and :BEAU_PERE_2
skos:scopeNote "beau-p`
ere"@fr
skos:editorialNote "Stiefvater"@de.
We chose editorialNote over the ostensibly
obvious SKOS property definition to be able to
use the latter for a further knowledge enrichment with
accurate genus-differentiae sense definitions.
11. We eliminate references to pages with homonymic
concepts assuming that this information won’t be of
value for semantic integration.
The result is shown in List. 2, the data is provided in Turtle
syntax (Prud’hommeaux and Carothers, 2014).11
1@prefix : <http://example.org/hallig-wartburg#> .
2
3:HW a skos:ConceptScheme ;
4skos:prefLabel "HW classification scheme"@en ;
5xkos:numberOfLevels 9 ;
6xkos:levels ( :HW_Level1 ... :HW_Level9 ) ;
7skos:hasTopConcept :L_HOMME , :L_UNIVERS , ... .
8
9:hwIdentificationScheme a rdfs:Datatype ;
10 rdfs:comment "HW concept identification scheme" ;
11 owl:oneOf (
12 "B"ˆˆxsd:string
13 "BIk1cc1"ˆˆxsd:string
14 "BIk1cc2"ˆˆxsd:string ... ) .
15 :HW_Level1 a xkos:ClassificationLevel ;
16 xkos:depth 1 ;
17 skos:member :L_UNIVERS , :L_HOMME ,
18 :L_HOMME_ET_L_UNIVERS .
19 :L_HOMME a skos:Concept ;
20 skos:altLabel "L’homme"@fr ;
21 skos:scopeNote "L’homme"@fr ;
22 skos:notation "B"ˆˆ:hwIdentificationScheme;
23 skos:inScheme :HW ;
24 skos:topConceptOf :HW ;
27
11For the sake of brevity, we suppress (lines of) code that do
not add substantial value, and standard namespaces are assumed
defined the usual way, also in List. 3, 5 and 6.
25 skos:narrower :L_HOMME_ETRE_PHYSIQUE .
26 :L_HOMME_ETRE_PHYSIQUE a skos:Concept ;
28 skos:altLabel "L’homme, ˆ
etre physique"@fr ;
29 skos:scopeNote "L’homme, ˆ
etre physique"@fr ;
30 skos:notation "B I"ˆˆ:hwIdentificationScheme ;
31 skos:inScheme :HW ;
32 skos-thes:broaderGeneric :L_HOMME ;
33 skos:narrower :LE_SEXE , :LA_RACE , ... .
34 :LA_VIANDE a skos:Concept ;
35 skos:altLabel "La viande"@fr ;
36 skos:scopeNote "La viande"@fr ;
37 skos:notation "BIk1cc1"ˆˆ:hwIdentificationScheme ;
38 skos:inScheme :HW ;
39 skos-thes:broaderGeneric :LES_ALIMENTS ;
40 skos:narrower :VIANDE , :JAMBON , :LARD ... .
41 :MIETTE a skos:Concept ;
42 skos:altLabel "miette"@fr ;
43 skos:scopeNote "miette"@fr ;
44 skos:inScheme :HW ;
45 skos-thes:relatedPartOf :LE_PAIN_LA_PATISSERIE .
46 :LES_NARINES a skos:Concept ;
47 skos:altLabel "les narines"@fr ;
48 skos:scopeNote "les narines"@fr ;
49 skos:inScheme :HW ;
50 skos-thes:broaderPartitive :LE_CORPS_ET_LES_MEMBRES .
Listing 2: Extract of RDF data.
We have considered including the Lemon-tree vocabulary
into the modelling. Lemon-tree has specifically been de-
signed to model lexicographical thesaurus-like resources
as LD, bridging SKOS and the OntoLex-lemon vocabu-
lary (Stolk, 2019). Yet, for the modelling of HW, follow-
ing the examples given by Lemon-tree, only SKOS and
XKOS would be used, hence the advantage would not be
obvious.12 The MHDBDB has created a SKOS model of
the onomasiological framework (extending HW) that struc-
tures the data.13 However, its design differs significantly
from the result of our attempt: The model excludes both
the original HW identifiers and the French concept denom-
inations. Instead, concept denominations have been trans-
lated to German and English, and they are treated as lexical
terms, expressed through the SKOS property ‘prefLabel’.
The model expresses the relationships solely as hierarchi-
cal generic through SKOS ‘broader’ (not using the inverse
relation ‘narrower’, resulting in the fact that a navigation
from a top level down is not possible). In any case, it has
become clear that an LOD compliant model of HW presents
a desideratum in the discipline of historical linguistic data.
3.2. Towards an Ontological Model
The HW RDF/SKOS model is compliant with the LOD
paradigm but it is a representation close to the book pub-
lished in 1953. With the means of a KOS, it lacks of con-
ceptual abstraction, nuanced semantic relations, and infor-
mation integration for interoperability (cp. Soergel et al.
(2006)). The Web Ontology Language (OWL) (Bechhofer
et al., 2004) is a popular W3C recommended format to ex-
press ontologies, offering an alternative means for porting
KOSs to the Semantic Web. The next step is, thus, to con-
struct an ontological model of the HW in OWL on the ba-
sis of the RDF/SKOS model. This will allow for more ex-
12A linguistic resource could, however, use Lemon-trees’s ob-
ject property isSenseInConcept to relate a “lexical sense to a
concept that captures its meaning to some extent (that is, partially
or even fully)” (Stolk, 2019).
13We thank Peter Hinkelmanns, MHDBDB, for making the
model available to us and for sharing thoughts on how to model
HW in SKOS.
pressivity and descriptiveness than offered by SKOS rela-
tions, also preparing for future extension. The result will
be a lightweight ontology, i.e., an RDF document serialised
in OWL, its benefit over the RDF/SKOS model being bet-
ter interoperability and the potential for a extra-linguistic
cross-mapping hub for the (historical) linguistic resources
using HW concepts as their onomasiological architecture:
A lightweight ontology based on HW provides a possibility
for resources such as DAG´
el, LEI, DEAF, and MHDBDB
to create instances of the HW classes.
The HW concepts meet the requirement of reflecting uni-
versal categories and the SKOS concepts (instances in
SKOS) can thus be represented as classes in OWL (cp.
Baker et al. (2013, 38); Kless et al. (2012b, 406-409)). This
is a viable approach for creating an ontology in OWL Full
but its result of course does not have inferencing qualities.
Adding the expressive capabilities to allow for reasoning
over the ontological model requires a re-engineering of the
SKOS model into a formal ontology expressed with OWL
DL, which we will discuss shortly in section 5.
The syntactic conversion from the SKOS model into OWL
Full is not straightforward. The fact that thesauri-like
KOSs express concept relations through basically two kinds
of relationships only (hierarchical and associative) makes
them underspecified from the perspective of an ontolog-
ical model (Kless et al., 2012b). At the same time, the
aligning of specific relationships in a thesaurus to rela-
tionships in an ontological model is not obvious and lacks
of corresponding relata, in particular, associative relation-
ships rarely find their matches (ib. 412). In this paper, we
demonstrate the approach of adopting the relationships ex-
pressed by SKOS and its thesaurus extension (ib. 422): The
conversion of the concepts ordered hierarchically by the
generic relation into class/sub-class relations (expressed by
means of RDFS ‘subClassOf’) (Brickley and Guha, 2014)
is obvious; skos-thes:broaderPartitive will be
preserved for the hierarchical whole-part relationship, and
skos-thes:relatedPartOf for the associative rela-
tionship. The lexical label can be expressed through RDFS
‘label’, the SKOS properties ‘scopeNote’ and ‘notation’
will be preserved. We conducted a small study representing
sample data of HW as an ontological model, see List. 3.
1<rdf:RDF xmlns="https://example.org/hallig-wartburg-
2ontology#">
3
4<owl:Ontology rdf:about="https://example.org/hallig-
5wartburg-ontology#">
6<dct:title xml:lang="en">Hallig-Wartburg Ontology
7</dct:title>
8<vann:preferredNamespacePrefix>hw
9</vann:preferredNamespacePrefix>
10 <dct:description xml:lang="en">Ontology based on ...
11 </dct:description>
12 <owl:versionInfo rdf:datatype="http://www.w3.org/
13 2001/XMLSchema#string">1.0.0
14 </owl:versionInfo>
15 </owl:Ontology>
16
17 <!-- datatype properties -->
18 <owl:DatatypeProperty rdf:about="https://lod.academy/
19 hw-onto/ns/hw#hwIdentificationScheme">
20 <rdfs:label xml:lang="en">HW Identification Scheme
21 </rdfs:label>
22 <rdfs:range>
23 <rdfs:Datatype>
24 <owl:oneOf>...</owl:oneOf>
25 </rdfs:Datatype>
26 </rdfs:range>
27 </owl:DatatypeProperty>
28 <!-- classes -->
29 <owl:Class rdf:about="https://example.org/hallig-
30 wartburg-ontology#LA_VIANDE">
31 <skos:scopeNote xml:lang="fr">La viande</skos:scopeNote>
32 <skos:notation rdf:datatype="https://lod.academy/
33 hw-onto/ns/hw#hwIdentificationScheme">
34 BIk1cc1</skos:notation>
35 <rdfs:label xml:lang="fr">La viande</rdfs:label>
36 <rdfs:subClassOf rdf:resource="https://example.org/
37 hallig-wartburg-ontology#LES_ALIMENTS"/>
38 </owl:Class>
39 <owl:Class rdf:about="https://example.org/hallig-
40 wartburg-ontology#MIETTE">
41 <skos:scopeNote xml:lang="fr">miette</skos:scopeNote>
42 <rdfs:label xml:lang="fr">miette</rdfs:label>
43 <rdfs:subClassOf rdf:resource="https://example.org/
44 hallig-wartburg-ontology#HWCat"/>
45 <skos-thes:relatedPartOf rdf:resource="https://example.
46 org/hallig-wartburg-ontology#LE_PAIN_LA_PATISSERIE"/>
47 </owl:Class>
48 <owl:Class rdf:about="https://example.org/hallig-wartburg-
49 ontology#LES_NARINES">
50 <skos:scopeNote xml:lang="fr">les narines</skos:scopeNote>
51 <rdfs:label xml:lang="fr">les narines</rdfs:label>
52 <rdfs:subClassOf rdf:resource="https://example.org/
53 hallig-wartburg-ontology#HWCat"/>
54 <skos-thes:broaderPartitive rdf:resource="https://example.
55 org/hallig-wartburg-ontology#LE_CORPS_ET_LES_MEMBRES"/>
56 </owl:Class>
57 </rdf:RDF>
Listing 3: Extract of OWL ontology (RDF/XML syntax).
4. Practical Application
With the use cases of Old Gascon bacon (ham), entry of
DAG´
el, and of Italian cantuccino (a twice-baked almond
biscuit), entry of LEI, we demonstrate how—through the
interlinking of linguistic resources via the OntoLex-lemon
vocabulary—the integration of a reference to a concept of
the HW ontology can be integrated into an LOD resource.
Old Gascon bacon.The conversion of DAG´
el dictionary
entries into RDF is an automated process, broadly similar
to the conversion of DEAF (Tittel and Chiarcos, 2018). To
automatically insert a mapping of a sense definition to the
correct HW concept is straightforward, given that a refer-
ence from each sense to HW is part of the XML resource
data, as shown in List. 4.
1<m:definition>viande de porc sal&#xE9;e afin de
2la conserver</m:definition>
3<m:cat-onomas cat="B I k cc 1">BIk1cc1/
4La viande</m:cat-onomas>
Listing 4: XML resource data of a DAG´
el entry (extract).
The content of the XML element <cat-onomas> can be
transformed into hw:LA_VIANDE, to which we can refer
through OntoLex-lemon’s object property isConceptOf,
as shown in List. 5, l. 14.
1@prefix dag:<http://dag.adw.uni-heidelberg.de/
2lemme/> .
3@prefix hw:<http://example.org/hallig-wartburg-
4ontology#> .
5
6dag:bacon a ontolex:LexicalEntry ;
7ontolex:sense dag:bacon_sense ;
8ontolex:evokes dag:bacon_lexConcept ;
9ontolex:canonicalForm dag:bacon_form .
10 dag:bacon_form a ontolex:Form ;
11 ontolex:writtenRep "bacon"@oc-x-40000006 .
12
13 dag:bacon_lexConcept a ontolex:LexicalConcept ;
14 ontolex:isConceptOf hw:LA_VIANDE ;
15 ontolex:definition "viande de porc sal´
ee afin de la
16 conserver"@fr ;
17 ontolex:lexicalizedSense dag:bacon_sense .
Listing 5: Minimal example of DAG´
el data (RDF/Turtle).
We point out that a finer-grained concept for the Old
Gascon lexeme bacon is available, i.e., JAMBON (ham).
However, DAG´
el only uses the numbered concepts of
HW’s ‘Plan’ (Fig. 1) and thus refers to the super-concept
LA_VIANDE. As a consequence, a manual post-processing
should include replacing LA_VIANDE by JAMBON. Please
note that, in List. 5, l. 11, we use the language tag for
Old Gascon oc-x-40000006, a shortened form that ex-
pands to oc-x-02q35735-241050--1500 using the
Web application for generating and decoding language tags
at https://londisizwe.org/language-tags/
[07-02-2020].14
Italian cantuccino.The digitisation of the LEI and its
modelling as LOD is still work in progress. We can, how-
ever, show a manually created example of entry cantuccino
(LEI 10,1458,32) in List. 6.
1@prefix lei:<http://www.lei-digitale.org/> .
2
3lei:cantuccino a ontolex:LexicalEntry ;
4ontolex:sense lei:cantuccino_sense ;
5ontolex:evokes lei:cantuccino_lexConcept ;
6ontolex:canonicalForm lei:cantuccino_form .
7lei:cantuccino_form a ontolex:Form ;
8ontolex:writtenRep "cantuccino"@it .
9
10 lei:cantuccino_lexConcept a ontolex:LexicalConcept ;
11 ontolex:isConceptOf hw:LE_PAIN_LA_PATISSERIE ;
12 ontolex:definition "un pezzetto, un ritaglio di pane
13 dolce mandorlato"@it ;
14 ontolex:lexicalizedSense lei:cantuccino_sense .
Listing 6: Minimal example of LEI data (RDF/Turtle).
HW ontology as cross-mapping hub. The integration of
references to the HW ontology is a model to be followed
by other resources, where word-sense units refer to the
same HW concepts, thus, installing the HW lightweight
ontology as a cross-mapping hub and an access point to
semantic-driven, language- and word-form independent re-
search. E.g., a database search for the string ‘pˆ
atisserie’
within the sense definitions of all DEAF´
el entries pro-
duces 46 results: friolete f. “pˆ
atisserie l´
eg`
ere”, fromagie f.
“pˆ
atisserie faite de fromage et d’œufs”, etc. In DAG´
el, we
find the lexeme habanhas m. “pˆ
atisserie semi-sucr´
ee `
a base
de f`
eves”.15 A mapping of these lexemes to the correspond-
ing HW concept LE PAIN LA PATISSERIE could thus be
integrated into the LOD versions of DEAF and DAG in
an automated way, leading, in this example, to a seman-
tically driven, extra-linguistic cross-linking of LEI, DAG,
and DEAF.
5. Discussion and Future Work
In this paper, we have argued that the modelling of HW
as an LOD resource is an important step towards resource
integration and cross-language accessibility of historical
linguistic resources. The lightweight ontology based on
HW provides a model for external resources, facilitating
references for semantic mapping. However, moving from
14ISO 639 does not provide a language code for Old Gascon
and we thus follow the pattern to create a unique and decodable
language tag described by Gillis-Webber and Tittel (2020).
15A search for the HW concept ‘B I k 1 cc 2’ produces 21 lex-
emes but is less precise, leading also to lexemes denoting flour,
sieving flour, etc.
the RDF/SKOS format towards an ontology should in-
clude adding knowledge that enriches the model through
additional concepts, relationships, terms, and descriptive
metadata. This means adding labels in other languages,
and scholastic genus–differentia definitions to help grasp
the concepts, e.g., LA VIANDE: “flesh of animals (in-
cluding fishes and birds and snails) used as food” (use-
ful resources, i.e., dictionaries, WordNet, etc., for this
task need to be evaluated considering conceptualisation
incongruences and translation problems [cp. Bizzoni et
al. (2014) on the Ancient Greek WordNet]; a coopera-
tion with MHDBDB seems promising in this regard). As
a first step, we have published the identification scheme
used in Hallig-Wartburg (as shown in List. 3), available at
https://lod.academy/hw-onto/ns/hw#.
Re-engineering the Model into a Formal Ontology. To
enable reasoning over the HW ontology (that is not possible
with the OWL Full model demonstrated above) and to in-
troduce more expressive semantic relations for this purpose
requires the SKOS model to be re-engineered into a formal
ontology. The disjointness condition between OWL classes
and individuals (the SKOS concepts) must hold true for
OWL DL, thus, any SKOS and SKOS-THES relations will
need to be removed. However, to align the relationships
expressed through SKOS / SKOS-THES properties with
OWL DL is clearly not obvious (Keet and Artale, 2008;
Kless et al., 2012b; Baker et al., 2013; Adams et al., 2015).
It involves finding equivalences for hierarchical whole-part
(spatial, structural, etc.) relationships, associative relation-
ships (e.g., action and action instrument / results / partici-
pant / target / etc. (Kless et al., 2012b, 422f.), and coining
custom relation properties for relating nuanced same-level
and cross-level relations. Using the re-engineering of the
AGROVOC thesaurus as an example (Baker et al., 2019),
the cost-benefit ratio of a presumably very time-consuming
task must be considered. We thus identify a feasibility anal-
ysis of (i) re-assessing the relationships expressed in the
original HW resource, (ii) making them explicit and (iii)
expressing them through relations in OWL as future work.
Insufficient Scope and Granularity of HW concepts.
HW shows significant shortcomings that hamper an accu-
rate semantic mapping, reducing its relevance as an extra-
linguistic cross-mapping hub. The scope and granularity of
HW’s categories do not suffice when modelling the lexical
units of an entire language: HW is little appropriate for the
mapping of the so-called small words (e.g., pronouns, ar-
ticles). The differentiation is inadequate: HW is primarily
geared to general language and lacks any kind of technical
precision, e.g., in fields like ‘L’astronomie’ and ‘La biolo-
gie’ that are reduced to one single concept, respectively.
Insufficient Possibilities for Depicting Historical Life.
Regional and cultural imprints through time go hand in
hand with semantic shift. The HW, like other extra-
linguistic conceptualisations of the world such as DBpedia,
depicts modern reality. To map Old Italian `
aghila to HW
‘aigle’ or DBpedia ‘Eagle’16 is straightforward. However,
with language change and semantic shift, many problems
arise that make the semantic mapping from a lexeme in a
16http://dbpedia.org/page/Eagle [10-02-2020].
(medieval) historical linguistic resource to an entity of a
conceptual model of the modern world difficult: (i) things
(abstract or real) denoted by medieval words do not exist
anymore, (ii) words are extinct and, thus, the concepts de-
noted by them are hard to identify in a modern world on-
tology, e.g., Old French jaonoi m. “gorse-covered terrain”,
DEAF J 398,30, (iii) meanings of words are extinct, and
their modern equivalence is not obvious, e.g., Old French
jambe f. (“leg”, and also:) “post that serves as a support (for
a door lintel, a mantelpiece, a vault, etc.)”, DEAF J 94,15,
and (iv) meanings have undergone semantic shift and the
underlying concept is clearly different from the one of sym-
bolized by the modern corresponding word. E.g., the veine
was considered a sort of blood vessel that transports the
‘nourishing blood’ from the liver to each part of the body,
and the sperm designated both the male and the female gen-
erative cell, etc.17 Hence, a mapping to the modern con-
cepts of ‘veine’ and ‘sperm’ is not possible without caus-
ing semantic discrepancies. We refer to this circumstance
as the Historical Semantic Gap. Khan et al. (2014) ad-
dress the issue of modelling semantic shift with extending
the OntoLex-lemon vocabulary by adding a time interval to
capture different concepts of one lexeme through time. This
approach is a major enhancement from the point of view of
historical linguistics. However, it does not solve the prob-
lem of semantic mapping to an extra-linguistic conceptual
model where the historical concept is not represented.
To stabilise HW’s role as an onomasiological reference sys-
tem for historical (linguistic) resources, it must be elabo-
rated in two ways: The net of concepts must be refined
and concepts with historically appropriate content must be
added. We call the latter process the historicisation of HW.
To prepare for a future extension towards historicised con-
tent, we foresee a class HistCat and a symmetric (object)
property hasModernCounterpart, cf. List. 7.
1<owl:SymmetricProperty rdf:about="https://example.org/
2hallig-wartburg-ontology#hasModernCounterpart">
3<rdfs:label xml:lang="en">has modern counterpart
4</rdfs:label>
5</owl:SymmetricProperty>
6
7<owl:Class rdf:about="https://example.org/
8hallig-wartburg-ontology#HistCat">
9<rdfs:label xml:lang="en">historicised concept
10 </rdfs:label>
11 </owl:Class>
Listing 7: Added property and class to HW ontology.
HW presents few categories that mirror the specification
of historical times: Only four concepts include the notion
of ‘ancient’, e.g. ‘Les armes anciennes’ (early weapons,
next to ‘Les armes modernes’) and ‘Les bˆ
atiments de guerre
anciens’ (early warships, next to ‘Les bˆ
atiments de guerre
modernes’). With the added class and object property, e.g.,
the class LES_ARMES_ANCIENNES can be defined a sub-
class of HistCat and refer to LES_ARMES_MODERNES
through the property hasModernCounterpart. This
would, thus, support the use of HW as an onomasiological
framework by both historical and modern resources.
17DEAFpr´
eVEINE1,https://deaf-server.adw.
uni-heidelberg.de/lemme/veine1;ESPERME
.../lemme/esperme [25-02-2020].
6. Acknowledgements
The work of Frances Gillis-Webber was financially sup-
ported by Hasso Plattner Institute for Digital Engineering.
7. Bibliographical References
Adams, D., Jansen, L., and Milton, S. (2015). A content-
focused method for re-engineering thesauri into seman-
tically adequate ontologies. Semantic Web, 09.
Baker, T., Bechhofer, S., Isaac, A., Miles, A., Schreiber, G.,
and Summers, E. (2013). Key choices in the design of
Simple Knowledge Organization System (SKOS). Jour-
nal of Web Semantics, 20:35 – 49.
Baker, T., Whitehead, B., Musker, R., and Keizer, J.
(2019). Global agricultural concept space: lightweight
semantics for pragmatic interoperability. npj Science of
Food, 3, 12.
Baldinger, K. (1959). s.v. Romanistik. Deutsche Liter-
aturzeitung, 80:1090–1093.
Baldinger, K. (1975 to 2005). Dictionnaire onoma-
siologique de l’ancien occitan – DAO (fond´
e par
Kurt Baldinger, r´
edig´
e par Inge Popelar, puis Bernard
Henschel, puis Nicoline H¨
orsch/Winkler et Tiana
Shabafrouz). Niemeyer [Heidelberger Akademie der
Wissenschaften / Kommission f¨
ur das Altokzitanische
und Altgaskognische W¨
orterbuch], T¨
ubingen.
Baldinger, K. (since 1971). Dictionnaire ´
etymologique
de l’ancien franc¸ais – DEAF. Presses de L’Universit´
e
Laval / Niemeyer / De Gruyter, Qu´
ebec/T¨
ubingen/Berlin.
[continued by Frankwalt M¨
ohren, and Thomas St¨
adtler;
DEAF´
el: https://deaf-server.adw.uni-heidelberg.de].
Baldinger, K. (since 1975). Dictionnaire onomasiologique
de l’ancien gascon – DAG (fond´
e par Kurt Baldinger,
dirig´
e par Inge Popelar, puis Nicoline H¨
orsch / Win-
kler et Tiana Shabafrouz, sous la direction de Jean-Pierre
Chambon, puis Martin Glessgen). De Gruyter [Heidel-
berger Akademie der Wissenschaften / Kommission f ¨
ur
das Altokzitanische und Altgaskognische W¨
orterbuch],
T¨
ubingen / Berlin.
Bechhofer, S., van Harmelen, F., Hendler, J., Hor-
rocks, I., McGuinness, D. L., Patel-Schneider, P. F.,
and Stein, L. A. (2004). OWL Web Ontology Lan-
guage. Reference. W3C Recommendation 10 February
2004. UR L: https://www.w3.org/TR/2004/REC-owl-ref-
20040210/ [accessed: 09-02-2020].
Bevans, C. (1941). The Old French vocabulary of Cham-
pagne. A descriptive study based on localized and dated
documents. University of Chicago Libraries, Chicago.
Bizzoni, Y., Boschetti, F., Diakoff, H., Gratta, R. D., Mona-
chini, M., and Crane, G. (2014). The Making of Ancient
Greek WordNet. In Proceedings of the Ninth Interna-
tional Conference on Language Resources and Evalua-
tion (LREC’14), pages 1140–1147, Reykjavik, Iceland.
European Language Resources Association (ELRA).
Brickley, D. and Guha, R. (2014). RDF Schema
1.1. W3C Recommendation 25 February 2014. U RL :
https://www.w3.org/TR/rdf-schema/ [accessed: 13-02-
2020].
Chiarcos, C., McCrae, J., Cimiano, P., and Fellbaum, C.
(2013). Towards Open Data for Linguistics: Lexical
Linked Data. In Alessandro Oltramari, et al., editors,
New Trends of Research in Ontologies and Lexical Re-
sources: Ideas, Projects, Systems, pages 7–25. Springer,
Berlin, Heidelberg.
Christmann, H. and B¨
ockle, K. (1983). Bespr. von
Schwake, Der Wortschatz des Cliges.Zeitschrift f¨
ur ro-
manische Philologie, 99:397–403.
Cimiano, P., McCrae, J. P., and Buitelaar, P. (2016). Lexi-
con Model for Ontologies: Community Report, 10 May
2016. Final Community Group Report 10 May 2016.
UR L: https://www.w3.org/2016/05/ontolex/ [accessed:
10-02-2020].
Cyganiak, R., Wood, D., and Lanthaler, M.
(2014). RDF 1.1. concepts and abstract syntax:
W3C recommendation 25 February 2014. UR L:
https://www.w3.org/TR/2014/REC-rdf11-concepts-
20140225/ [11-02-2020].
Cyganiak, R., Gillman, D., Grim, R., Jaques, Y.,
and Thomas, W. (2017). An SKOS extension
for representing statistical classifications, ed. F.
Cotton, Unofficial Draft, 1 January 2017. U RL :
http://www.ddialliance.org/Specification/XKOS/1.0/
OWL/xkos.html [accessed: 07-02-2020].
Davidson, G. (2002). Roget’s thesaurus of English
words and phrases. (150th anniversary edition). Penguin
Books, London.
de Man, L. (1956). Bijdrage tot een systematisch glossar-
ium van de Brabantse oorkondentaal. Leuvens Archief
van circa 1300 to 1550. Deel, I.
Dornseiff, F. (1934). Der deutsche Wortschatz nach Sach-
gruppen. de Gruyter, Berlin.
Fellbaum, C. (1998). WordNet: An Electronic Lexical
Database. MIT Press, Cambridge, MA.
Gillis-Webber, F. and Tittel, S. (2020). A Framework for
Shared Agreement of Language Tags beyond ISO 639.
In Proceedings of LREC 2020 [accepted paper], N.N.
Glessgen, M. and Tittel, S. (2018). Le Dictionnaire
´
electronique de l’ancien gascon (DAG´
el). In Roberto
Antonelli, et al., editors, Atti del XXVIII Congresso in-
ternazionale di linguistica e filologia romanza (Roma,
18-23 luglio 2016), volume 1, pages 805–818. Soci´
et´
e
de Linguistique Romane / ´
Editions de linguistique et de
philologie ELiPi, Biblioth`
eque de Linguistique Romane
15,1.
Glessgen, M. (since 2014). Dictionnaire de l’ancien gas-
con – DAG´
el. (en collaboration avec Sabine Tittel)
UR L: https://dag.adw.uni-heidelberg.de/ [accessed: 02-
02-2020].
Hallig, R. and von Wartburg, W. (1963). Begriffssystem
als Grundlage f¨
ur die Lexikographie / Syst`
eme raisonn´
e
des concepts pour servir de base `
a la lexicographie.
Akademie-Verlag, Berlin. [first edition 1952].
Helou, M., Jarrar, M., Palmonari, M., and Fellbaum, C.
(2014). Towards building lexical ontology via cross-
language matching. In GWC 2014: Proceedings of the
7th Global Wordnet Conference, pages 346–354.
Hinkelmanns, P. (2019). Mittelhochdeutsche Lexikogra-
phie und Semantic Web. Die Anbindung der ‘Mit-
telhochdeutschen Begriffsdatenbank’ an Linked Open
Data. Das Mittelalter, 24(1):129–141.
International Organization for Standardization. (2011). In-
ternational Standard ISO 25964-1:2011, Information and
documentation – Thesauri and interoperability with other
vocabularies – Part 1: Thesauri for information retrieval,
Part 2: Interoperability with other vocabularies. URL:
https://www.iso.org/standard/53657.html.
Kay, C. (2009). Historical thesaurus of the Oxford English
dictionary. Oxford University Press, Oxford.
Keet, C. and Artale, A. (2008). Representing and reason-
ing over a taxonomy of part-whole relations. Applied
Ontology, 3:91–110.
Keller, H.-E. (1953). Etude descriptive sur le vocabulaire
de Wace. Akad. der Wiss. Berlin, Ver ¨
offentl. Inst. f¨
ur
Rom. Spr.wiss. 7, Berlin.
Khan, F., Frontini, F., and Boschetti, F. (2014). Using
lemon to Model Lexical Semantic Shift in Diachronic
Lexical Resources. In Proceedings of the 3rd Workshop
on Linked Data in Linguistics: Multilingual Knowledge
Resources and Natural Language, Reykjavik, Iceland.
Kless, D., Jansen, L., Lindenthal, J., and Wiebensohn, J.
(2012a). A method for re-engineering a thesaurus into an
ontology. Frontiers in Artificial Intelligence and Appli-
cations, DOI 10.3233/978-1-61499-084-0-133:133–146.
Kless, D., Milton, S., and Kazmierczak, E. (2012b). Re-
lationships and Relata in Ontologies and Thesauri: Dif-
ferences and Similarities. Applied Ontology, 7:401–428,
11.
Miles, A. and Bechhofer, S. (2009). SKOS Sim-
ple Knowledge Organization System reference:
W3C recommendation 18 August 2009. UR L:
https://www.w3.org/TR/2009/REC-skos-reference-
20090818/ [accessed: 07-02-2020].
Miles, A. and Brickley, D. (2004). SKOS
Extensions Vocabulary Specification. UR L:
www.w3.org/2004/02/skos/extensions/spec/2004-
10-18.html [20-02-2020].
Nannini, A. (in progress). La mappatura semantica del
Lessico Etimologico Italiano (LEI). Doctoral thesis.
Pfister, M. (since 1979). Lessico Etimologico Italiano
– LEI. Reichert, Wiesbaden. [2001– together with
W. Schweickard, 2018– W. Schweickard together with
E. Prifti].
Prifti, E. (2019). Lo stato della digitalizzazione del LEI.
Un resoconto. In Lino Leonardi et al., editors, Italiano
antico, italiano plurale. Testi e lessico del Medioevo nel
mondo digitale, page [in print]. N.N., Firenze.
Prud’hommeaux, E. and Carothers, G. (2014).
RDF 1.1 Turtle: Terse RDF Triple Language.
W3C Recommendation, 25 February 2014. URL :
http://www.w3.org/TR/turtle/ [accessed: 07-02-2020].
Renders, P. (2015). L’informatisation du Franz¨
osisches Et-
ymologisches W¨
orterbuch. Mod´
elisation d’un discours
´
etymologique. ELIPHI, Strasbourg.
Renders, P. (2019). Integrating the Etymological Dimen-
sion into the Onto-Lex Lemon Model: A Case of Study.
In Electronic lexicography in the 21st century (eLEX
2019). Book of Abstracts, pages 71–72.
Schmidt, K. (1980). Begriffsglossare und Indices zu Ulrich
von Lichtenstein. Indices zur deutschen Literatur 14/15.
Kraus International Publications, M¨
unchen.
Schmidt, K. (1988). Der Beitrag der Begriffsorien-
tierten Lexikographie zur systematischen Erfassung
von Sprachwandel und das Begriffsw¨
orterbuch zur
Mhd. Epik. In Wolfgang Bachofer, editor, Mittel-
hochdeutsches W¨
orterbuch in der Diskussion. Symposion
zur mittelhochdeutschen Lexikographie, Hamburg, Okto-
ber 1985, pages 35–49, T¨
ubingen. Niemeyer.
Schmidt, K. (1993). Begriffsglossar und Index zu Ulrich
von Zatzikhoven Lanzelet. Indices zur deutschen Liter-
atur 25. Niemeyer, T¨
ubingen.
Soergel, D., Lauser, B., Liang, A., Fisseha, F., Keizer, J.,
and Katz, S. (2006). Reengineering Thesauri for New
Applications: the AGROVOC Example. Journal of Dig-
ital Information, 4(4).
Stempel, W.-D. (1996 to 2013). Dictionnaire de
l’occitan m´
edi´
evale – DOM. Niemeyer / De Gruyter,
T¨
ubingen/Berlin. [continued by Maria Selig; electr. ver-
sion: http://www. dom-en-ligne.de].
Stolk, S. (2019). Lemon-tree. Document 30
March 2019. Latest editor’s draft. UR L:
https://ssstolk.github.io/onto/lemon-tree/index.html
[accessed: 07-02-2020].
Tancke, G. (1997). Note per un avviamento al Lessico Eti-
mologico Italiano (LEI). In G ¨
unter Holtus, et al., editors,
Italica et Romanica. Festschrift f ¨
ur Max Pfister zum 65.
Geburtstag, pages 457–487. Niemeyer, T ¨
ubingen.
Tittel, S. and Chiarcos, C. (2018). Historical Lexicogra-
phy of Old French and Linked Open Data: Transform-
ing the Resources of the Dictionnaire ´
etymologique de
l’ancien franc¸ais with OntoLex-Lemon. In Proceedings
of the Eleventh International Conference on Language
Resources and Evaluation (LREC 2018). GLOBALEX
Workshop (GLOBALEX-2018), Miyazaki, Japan, 2018,
pages 58–66, Paris (ELRA).
Tittel, S. (in progress). Integration von historischer
lexikalischer Semantik und Ontologien in den Digital
Humanities. Habilitation thesis.
von Wartburg, W. (since 1922). Franz ¨
osisches Etymolo-
gisches W¨
orterbuch. Eine darstellung des galloromanis-
chen sprachschatzes – FEW. ATILF. [continued by O.
J¨
anicke, C.T. Gossen, J.-P. Chambon, J.-P. Chauveau,
and Yan Greub].
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
The identification and annotation of languages in an unambiguous and standardized way is essential for the description of linguistic data. It is the prerequisite for machine-based interpretation, aggregation, and re-use of the data with respect to different languages. This makes it a key aspect especially for Linked Data and the multilingual Semantic Web. The standard for language tags is defined by IETF's BCP 47 and ISO 639 provides the language codes that are the tags' main constituents. However, for the identification of lesser-known languages, endangered languages, regional varieties or historical stages of a language, the ISO 639 codes are insufficient. Also, the optional language sub-tags compliant with BCP 47 do not offer a possibility fine-grained enough to represent linguistic variation. We propose a versatile pattern that extends the BCP 47 sub-tag privateuse and is, thus, able to overcome the limits of BCP 47 and ISO 639. Sufficient coverage of the pattern is demonstrated with the use case of linguistic Linked Data of the endangered Gascon language. We show how to use a URI shortcode for the extended sub-tag, making the length compliant with BCP 47. We achieve this with a web application and API developed to encode and decode the language tag.
Article
Full-text available
Progress on research and innovation in food technology depends increasingly on the use of structured vocabularies—concept schemes, thesauri, and ontologies—for discovering and re-using a diversity of data sources. Here, we report on GACS Core, a concept scheme in the larger Global Agricultural Concept Space (GACS), which was formed by mapping between the most frequently used concepts of AGROVOC, CAB Thesaurus, and NAL Thesaurus and serves as a target for mapping near-equivalent concepts from other vocabularies. It provides globally unique identifiers, which can be used as keywords in bibliographic databases, tags for web content, for building lightweight facet schemes, and for annotating spreadsheets, databases, and image metadata using synonyms and variant labels in 25 languages. The minimal semantics of GACS allows terms defined with more precision in ontologies, or less precision in controlled vocabularies, to be linked together making it easier to discover and integrate semantically diverse data sources.
Conference Paper
Full-text available
The adaptation of novel techniques and standards in computational lexicography is taking place at an accelerating pace, as manifested by recent extensions beyond the traditional XML-based paradigm of electronic publication. One important area of activity in this regard is the transformation of lexicographic resources into (Linguistic) Linked Open Data ([L]LOD), and the application of the OntoLex-Lemon vocabulary to electronic editions of dictionaries. At the moment, however, these activities focus on machine-readable dictionaries, natural language processing and modern languages and found only limited resonance in philology in general and in historical language stages in particular. This paper presents an endeavor to transform the resources of a comprehensive dictionary of Old French into LOD using OntoLex-Lemon and it sketches the difficulties of modeling particular aspects that are due to the medieval stage of the language.
Thesis
Texte intégral accessible uniquement aux membres de l'Université de Lorraine
Article
The inclusion of Semantic Web technologies into the lexicographic ‘Middle High German Conceptual Database’ (MHDBDB) is a challenge for this long-term project. Since the 1970 s the Middle High German Concept Database has aimed to provide an onomasiological dictionary for Middle High German. The latest technological revision dates back to 1992, so there is a growing demand for more contemporary infrastructure and usability. The data models themselves, as well as the linking of data sets with authority files, need to be modernised to ensure compatibility with the Semantic Web . This paper summarises the current discussion on formats and ontologies for online dictionaries with a focus on Middle High German lexicography.
Article
Many types of part-whole relations have been proposed in the literature to aid the conceptual modeller to choose the most appropriate type, but many of those relations lack a formal specification to give clear and unambiguous semantics to them. To remedy this, a formal taxonomy of types of mereological and meronymic part-whole relations is presented that distinguishes between transitive and intransitive relations and the kind of entity types that are related. The demand to use it effectively brings afore new requirements for automated reasoning over a hierarchy of relations. To ensure logically and ontologically correct inferencing over both the class and role hierarchy, the new reasoning service RBox compatibility for Description Logics reasoners is introduced. The proposed combination of formal semantics and the new reasoning service will improve the representation of the application domain when using part-whole relations in conceptual models and ontologies.