Figure - uploaded by Frances Gillis-Webber
Content may be subject to copyright.
Figure2: (a) Approximate region where the patois of Sainte-Sabine was recorded. (b) Region of Vimeu in Picardy

Figure2: (a) Approximate region where the patois of Sainte-Sabine was recorded. (b) Region of Vimeu in Picardy

Source publication
Conference Paper
Full-text available
When modelling linguistic resources as Linked Data, the identification of languages using language tags and language codes is a mandatory task. IETF's BCP 47 defines the standard for tags, and ISO 639 provides the codes. However, these codes are insufficient for the identification of diatopic variation within a language and, also, for different his...

Citations

... Nev- ertheless, NIF has been used as a publication format for corpora with entity annotations. 121 NIF continues to be a popular component of the DBpedia technology stack. At the same time, active development of NIF seems to have slowed down since the mid-2010s, whereas limited progress on NIF standardization has been achieved. ...
... The core data structure of the Web Annotation Data Model is the annotation, i.e., instances of oa:Annotation that have an oa:hasTarget property that identifies the element that carries the annotation, and the oa:has- 121 The most prominent example, the NIF edition of the Brown corpus published in 2015, formerly available from http://brown. nlp2rdf.org/, ...
... lang-subtags-templates.xhtml 166 Cf. https://github.com/w3c/i18n-discuss/issues/13 . very notion of language tags has been criticised as being both too inflexible as well as unable to address the needs of linguistics, e.g., recently by [120,121], and alternatives are being explored [122]. URI-based language identification represents a natural alternative in such cases, as these are not tied to any single standardization body or maintainer, but allow the marking of both the respective organization or maintainer of the resource (as part of the namespace) and the individual language (in the local name). ...
Article
Full-text available
This article provides a comprehensive and up-to-date survey of models and vocabularies for creating linguistic linked data (LLD) focusing on the latest developments in the area and both building upon and complementing previous works covering similar territory. The article begins with an overview of some recent trends which have had a significant impact on linked data models and vocabularies. Next, we give a general overview of existing vocabularies and models for different categories of LLD resource. After which we look at some of the latest developments in community standards and initiatives including descriptions of recent work on the OntoLex-Lemon model, a survey of recent initiatives in linguistic annotation and LLD, and a discussion of the LLD metadata vocabularies META-SHARE and lime. In the next part of the paper, we focus on the influence of projects on LLD models and vocabularies, starting with a general survey of relevant projects, before dedicating individual sections to a number of recent projects and their impact on LLD vocabularies and models. Finally, in the conclusion, we look ahead at some future challenges for LLD models and vocabularies. The appendix to the paper consists of a brief introduction to the OntoLex-Lemon model.
... Searching for and modelling diachronic change requires rethinking some contemporary (Semantic) Web infrastructure. As [190] shows, standardised language tags cannot capture the differences between Old-, Middle-and Modern French resources. Digital editions, often modelled in TEI [191], are a rich resource of diachronic language variation. ...
Article
Full-text available
This paper presents an overview of the LL(O)D and NLP methods, tools and data for detecting and representing semantic change, with its main application in humanities research. The paper’s aim is to provide the starting point for the construction of a workflow and set of multilingual diachronic ontologies within the humanities use case of the COST Action Nexus Linguarum, European network for Web-centred linguistic data science, CA18209. The survey focuses on the essential aspects needed to understand the current trends and to build applications in this area of study.
Conference Paper
Full-text available
The identification and annotation of languages in an unambiguous and standardized way is essential for the description of linguistic data. It is the prerequisite for machine-based interpretation, aggregation, and re-use of the data with respect to different languages. This makes it a key aspect especially for Linked Data and the multilingual Semantic Web. The standard for language tags is defined by IETF's BCP 47 and ISO 639 provides the language codes that are the tags' main constituents. However, for the identification of lesser-known languages, endangered languages, regional varieties or historical stages of a language, the ISO 639 codes are insufficient. Also, the optional language sub-tags compliant with BCP 47 do not offer a possibility fine-grained enough to represent linguistic variation. We propose a versatile pattern that extends the BCP 47 sub-tag privateuse and is, thus, able to overcome the limits of BCP 47 and ISO 639. Sufficient coverage of the pattern is demonstrated with the use case of linguistic Linked Data of the endangered Gascon language. We show how to use a URI shortcode for the extended sub-tag, making the length compliant with BCP 47. We achieve this with a web application and API developed to encode and decode the language tag.