Christian Chiarcos

Christian Chiarcos
Goethe-Universität Frankfurt am Main · Angewandte Computerlinguistik (ACoLi), Institut für Informatik

Prof. Dr.

About

111
Publications
27,195
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,182
Citations
Introduction
Focus areas: (1) Natural Language Semantics, with a focus on semantic parsing (Information Extraction / Text Analytics) and deep (discourse) semantics, (2) Data Science and Knowledge Representation, with a focus on information integration and interoperability for knowledge processing and natural language processing, (3) Artificial Intelligence and Machine Learning, with a focus on Semantic Web technologies and neural networks, and (4) Digital Humanities and Computational Philology as an important area of application for such methods, e.g., with respect to historical or low resource languages.

Publications

Publications (111)
Article
Full-text available
This article provides a comprehensive and up-to-date survey of models and vocabularies for creating linguistic linked data (LLD) focusing on the latest developments in the area and both building upon and complementing previous works covering similar territory. The article begins with an overview of some recent trends which have had a significant im...
Chapter
In this paper, we present cqp4rdf, a set of tools for creating and querying corpora with linguistic annotations. cqp4rdf builds on CQP, an established corpus query language widely used in the areas of computational lexicography and empirical linguistics, and allows to apply it to corpora represented in RDF. This is in line with the emerging trend o...
Chapter
Full-text available
We describe the use of linguistic linked data to support a cross-lingual transfer framework for sentiment analysis in the pharmaceutical domain. The proposed system dynamically gathers translations from the Linked Open Data (LOD) cloud, particularly from Apertium RDF, in order to project a deep learning-based sentiment classifier from one language...
Conference Paper
Full-text available
We describe the use of linguistic linked data to support a cross-lingual transfer framework for sentiment analysis in the pharmaceutical domain. The proposed system dynamically gathers translations from the Linked Open Data (LOD) cloud, particularly from Apertium RDF, in order to project a deep learning-based sentiment classifier from one language...
Conference Paper
Full-text available
With regard to the wider area of AI/LT platform interoperability, we concentrate on two core aspects: (1) cross-platform search and discovery of resources and services; (2) composition of cross-platform service workflows. We devise five different levels (of increasing complexity) of platform interoperability that we suggest to implement in a wider...
Conference Paper
We introduce the Flexible and Integrated Transformation and Annotation eNgeneering (Fintan) platform for converting heterogeneous linguistic resources to RDF. With its modular architecture, workflow management and visualization features, Fintan facilitates the development of complex transformation pipelines by integrating generic RDF converters and...
Preprint
Full-text available
With regard to the wider area of AI/LT platform interoperability, we concentrate on two core aspects: (1) cross-platform search and discovery of resources and services; (2) composition of cross-platform service workflows. We devise five different levels (of increasing complexity) of platform interoperability that we suggest to implement in a wider...
Conference Paper
Full-text available
In this paper we describe the contributions made by the European H2020 project "Prêt-à-LLOD" ('Ready-to-use Multilingual Linked Language Data for Knowledge Services across Sectors') to the further development of the Linguistic Linked Open Data (LLOD) infrastructure. Prêt-à-LLOD aims to develop a new methodology for building data value chains applic...
Chapter
LD technologies allow metadata of datasets to be exposed on the Web in order to improve their automated discovery, sharing and reuse by humans and software agents. In this chapter we deal with the representation of metadata for LRs, with the idea of enabling their cataloguing, discovery and later reuse. We will distinguish two types of metadata: ge...
Chapter
This chapter introduces the Lexicon Model for Ontologies (lemon) as defined by the Ontolex W3C community group. The model was originally developed to enrich ontologies with lexical information expressing how the elements of the ontology including classes, properties and individuals are referred to in a given language. In this chapter we cover the c...
Chapter
In previous chapters, we discussed how to model linguistic data sets using the Resource Description Framework as a basis to publish them as linked data on the Web. In this chapter, we describe a methodology that can be followed in the transformation of legacy linguistic datasets into linked data. The methodology comprises of different tasks, includ...
Chapter
This chapter introduces the Linguistic Linked Open Data (LLOD) Cloud. In recent years, there has been increasing interest in publishing linguistic datasets following linked data principles. A number of community-driven activities, foremost organized by the Open Linguistics Working Group (OWLG), have fostered and supported the publication of open li...
Chapter
This chapter introduces preliminaries that are essential to follow the content in the remainder of this book. First of all, we introduce the core data model of the Semantic Web and linked data, that is the Resource Description Framework, RDF. This format was designed in the 1990s and its core purpose is to represent data and knowledge in a Web-comp...
Chapter
In this chapter we address the question of how links can be discovered between different datasets published as Linguistic Linked Open Data. We describe common patterns to represent links both between data that are on the same language (monolingual scenario) and between data in different languages (cross-lingual scenario). Further, we describe techn...
Chapter
The (re-)usability of NLP tools and language resources has long been recognized as a key challenge in the language resource and NLP communities. Reuse of resources, however, requires a minimum level of interoperability, and in this chapter, we focus on conceptual interoperability, i.e. harmonization between different annotation schemas by means of...
Chapter
This chapter describes how linguistic annotations can be represented in RDF. Web Annotation and NIF provide the means to reference text segments on the web. Yet, representing linguistic annotations requires appropriate vocabularies. We discuss relevant vocabularies and illustrate how they can be applied to support annotation at different levels.
Chapter
Text annotation consists in defining markables (elements to be annotated), their features (attributes and values of annotations) and relations between markables (e.g. syntactic dependencies or semantic links). In this chapter we describe the principles for annotating text data using RDF-compliant formalisms. These principles provide the basis for m...
Chapter
The Linguistic Linked Data (LLD) paradigm was introduced about 8 years ago by the Open Linguistics Working Group (OWLG). The original mission of this group was to (1) promote the use of open standards in linguistics; (2) act as a central point of reference and provide support for those interested in open linguistic data; (3) develop best practices...
Chapter
In this chapter we describe principles and architectures that support the development of NLP workflows and pipelines based on linked data technology. The benefit of NLP workflows that build on linked data standards is that they build on an open set of data models and Web technologies that can be implemented with standard functionality not requiring...
Chapter
Finding appropriate language resources for a particular research purpose or task is of crucial importance and represents a significant challenge at the same time. Currently, there are a number of distributed data repositories which contain metadata about many language resources. However, the metadata formats and metadata content is not harmonized a...
Chapter
In recent years, Digital Humanities (DH) has become an increasingly flourishing field of research, often posing novel research challenges that require extensions or revisions of existing technologies. One characteristic of this area is the great heterogeneity of scientific disciplines and user communities involved. This leads to heterogeneity of da...
Chapter
Wordnets are the most widely used lexical resources in natural language processing (NLP). There exist wordnets in more than 40 languages by now and all of these are connected to the original Princeton WordNet. The origins of linguistic linked data (LD) can thus in some sense be traced to the WordNet project. The implementation of the linking, howev...
Book
This is the first monograph on the emerging area of linguistic linked data. Presenting a combination of background information on linguistic linked data and concrete implementation advice, it introduces and discusses the main benefits of applying linked data (LD) principles to the representation and publication of linguistic resources, arguing that...
Book
This open access volume (https://direct.mit.edu/books/book/4618/Development-of-Linguistic-Linked-Open-Data) examines the challenges inherent in making diverse data in linguistics and the language sciences open, distributed, integrated, and accessible, thus fostering wide data sharing and collaboration. It is unique in integrating the perspectives o...
Chapter
Full-text available
An overview of and an introduction to the use of Open Data, Linked Data and Linked Open Data in the field of Linguistics, focusing in the area of Linguistic Linked Open Data (LLOD)
Article
Full-text available
This paper describes work on the morphological and syntactic annotation of Sumerian cuneiform as a model for low resource languages in general. Cuneiform texts are invaluable sources for the study of history, languages, economy, and cultures of Ancient Mesopotamia and its surrounding regions. Assyriology, the discipline dedicated to their study, ha...
Conference Paper
Full-text available
The adaptation of novel techniques and standards in computational lexicography is taking place at an accelerating pace, as manifested by recent extensions beyond the traditional XML-based paradigm of electronic publication. One important area of activity in this regard is the transformation of lexicographic resources into (Linguistic) Linked Open D...
Conference Paper
Full-text available
This paper presents an endeavor to transform a scholarly text edition (of a medical treatise written in Middle French) into a digital edition enriched with references to an on-line dictionary. Hitherto published as a book, the resulting digital edition will use RDFa to interlink its vocabulary with the corresponding lexical entries of the Dictionna...
Chapter
Full-text available
The physical formats used to represent linguistic data and its annotations have evolved over the past four decades, accommodating different needs and perspectives as well as incorporating advances in data representation generally. This chapter provides an overview of representation formats with the aim of surveying the relevant issues for represent...
Conference Paper
Full-text available
Understanding the differences underlying the scope, usage and content of language data requires the provision of a clarifying terminological basis which is integrated in the metadata describing a particular language resource. While terminological resources such as the SIL Glossary of Linguistic Terms, ISOcat or the GOLD ontology provide a considera...
Conference Paper
Full-text available
We introduce CoNLL-RDF, a direct rendering of the CoNLL format in RDF, accompanied by a formatter whose output mimicks CoNLL’s original TSV-style layout. CoNLL-RDF represents a middle ground that accounts for the needs of NLP specialists (easy to read, easy to parse, close to conventional representations), but that also facilitates LLOD integration...
Conference Paper
Interlinear glossed text (IGT) is a notation used in various fields of linguistics to provide readers with a way to understand the linguistic phenomena. We describe the representation of IGT data in RDF, the conversion from two popular tools, and their automated linking with resources from the Linguistic Linked Open Data (LLOD) cloud. We argue that...
Article
Full-text available
We introduce an attention-based Bi-LSTM for Chinese implicit discourse relations and demonstrate that modeling argument pairs as a joint sequence can outperform word order-agnostic approaches. Our model benefits from a partial sampling scheme and is conceptually simple, yet achieves state-of-the-art performance on the Chinese Discourse Treebank. We...
Book
This book constitutes the combined refereed proceedings of ISWC Satellite Wor shops KEKI and NLP&DBpedia 2016 which were held in conjunction with ISWC 2016 in Kobe, Japan, in October 2016. The 9 papers presented were carefully selected and reviewed from 20 submissions. They focus on the use of linguistic linked open data, the linguistic aspects of...
Book
This book constitutes the proceedings of the First International Conference on Language, Data and Knowledge, LDK 2017, held in Galway, Ireland, in June 2017. The 14 full papers and 19 short papers included in this volume were carefully reviewed and selected from 68 initial submissions. They deal with language data; knowledge graphs; applications in...
Conference Paper
Full-text available
The Open Linguistics Working Group (OWLG) brings together researchers from various fields of linguistics, natural language processing, and information technology to present and discuss principles, case studies, and best practices for representing, publishing and linking linguistic data collections. A major outcome of our work is the Linguistic Link...
Conference Paper
Full-text available
We present experiments on word segmentation for Akkadian cuneiform, an ancient writing system and a language used for about 3 millennia in the ancient Near East. To our best knowledge, this is the first study of this kind applied to either the Akkadian language or the cuneiform writing system. As a logosyllabic writing system, cuneiform structurall...
Article
Full-text available
This paper describes the Ontologies of Linguistic Annotation (OLiA) as one of the data sets currently available as part of Linguistic Linked Open Data (LLOD) cloud. Within the LLOD cloud, the OLiA ontologies serve as a reference hub for annotation terminology for linguistic phenomena on a great band-width of languages, they have been used to facili...
Article
Full-text available
We introduce lemonUby, a new lexical resource integrated in the Semantic Web which is the result of converting data extracted from the existing large-scale linked lexical resource UBY to the lemon lexicon model. The following data from UBY were converted: WordNet, FrameNet, VerbNet, English and German Wiktionary, the English and German entries of O...
Poster
Full-text available
With our poster and the accompanying demo, we present current progress on the information-technological support for scholars and students of cuneiform. For a period of about 3000 years, cuneiform was the dominant writing system of the Ancient Near East, with a rich literary tradition in several languages, and an extensive amount of texts preserved...
Conference Paper
Full-text available
We describe a minimalist approach to shallow discourse parsing in the context of the CoNLL 2015 Shared Task. 1 Our parser integrates a rule-based component for argument identification and datadriven models for the classification of explicit and implicit relations. We place special emphasis on the evaluation of implicit sense labeling, we present di...
Conference Paper
Full-text available
We propose a generic, memory-based approach for the detection of implicit semantic roles. While state-of-the-art methods for this task combine hand-crafted rules with specialized and costly lexical resources, our models use large corpora with automated annotations for explicit semantic roles only to capture the distribution of predicates and their...
Conference Paper
Full-text available
We provide an overview of on-going efforts to facilitate the study of older Germanic languages currently pursued at the Goethe-University Frankfurt, Germany. We describe created resources, such as a parallel corpus of Germanic Bibles and a morphosyntactically annotated corpus of Old High German (OHG) and Old Saxon, a lexicon of OHG in XML and a mul...
Conference Paper
Full-text available
For the study of historical language varieties, the sparsity of training data imposes immense problems on syntactic annotation and the development of NLP tools that automatize the process. In this paper, we explore strategies to compensate the lack of training data by including data from related varieties in a series of annotation projection experi...
Article
Full-text available
‘Open Data’ has become very important in a wide range of fields. However for linguistics, much data is still published in proprietary, closed formats and is not made available on the web. We propose the use of linked data principles to enable language resources to be published and interlinked openly on the web, and we describe the application of th...
Article
Full-text available
We describe on going community-efforts to create a Linked Open Data (sub-)cloud of linguistic resources, with an emphasis on resources that are specific to linguistic research, namely annotated corpora and linguistic databases. We argue that for both types of resources, the application of the Linked Open Data paradigm and the representation in RDF...
Conference Paper
Full-text available
This paper describes a novel approach towards the empirical approximation of discourse relations between different utterances in texts. Following the idea that every pair of events comes with preferences regarding the range and frequency of discourse relations connecting both parts, the paper investigates whether these preferences are manifested in...
Conference Paper
Full-text available
This paper describes POWLA, a generic formalism to represent linguistic annotations in an interoperable way by means of OWL/DL. Unlike other approaches in this direction, POWLA is not tied to a specific selection of annotation layers, but it is designed to support any kind of text-oriented annotation.
Chapter
Full-text available
The Open Linguistics Working Group (OWLG) is an initiative of experts from different fields concerned with linguistic data, including academic linguistics (e.g. typology, corpus linguistics), applied linguistics (e.g. computational linguistics, lexicography and language documentation) and NLP (e.g. from the Semantic Web community). The primary goal...
Chapter
The contributions of this part have described recent activities of the OWLG as a whole and of individual OWLG members aiming to provide linguistic resources as Linked Data. Here, we describe how linguistic resources can be linked with each other, and we illustrate possible use cases of information integration from various sources with example queri...
Article
Full-text available
This paper announces the release of the Ontologies of Linguistic Annotation (OLiA). The OLiA ontologies represent a repository of annotation terminology for various linguistic phenomena on a great band-width of languages. This paper summarizes the results of five years of research, it describes recent developments and directions for further researc...
Chapter
The explosion of information technology in the last two decades has led to a substantial growth in quantity, diversity and complexity of web-accessible linguistic data. These resources become even more useful when linked with each other, and the last few years have seen the emergence of numerous approaches in various disciplines concerned with ling...
Chapter
This paper describes the application of OWL and RDF to address the interoperability of linguistic corpora and linguistic annotations within such corpora. Interoperability of linguistic corpora involves two aspects: Structural interoperability (annotations of different origin are represented using the same formalism) and conceptual interoperability...
Article
Given the contemporary trend to modular NLP architectures and multiple annotation frameworks, the existence of concurrent tokenizations of the same text represents a pervasive problem in everyday's NLP practice and poses a non-trivial theoretical problem to the integration of linguistic annotations and their interpretability in general. This paper...
Article
Full-text available
We describe the application of a framework for salience metrics and linguistic variabil-ity with respect to the contextually adequate choice of referring expressions and grammati-cal roles: Where multiple meaning-equivalent candidate realizations are available that dif-fer in one of these aspects, NLG systems can apply salience metrics to predict c...
Article
Full-text available
In this paper, we describe tools and resources for the study of African languages developed at the Collaborative Research Centre "Infor- mation Structure". These include deeply anno- tated data collections of 25 subsaharan languages that are described together with their annotation scheme, and further, the cor- pus tool ANNIS that provides a unifie...
Conference Paper
Full-text available
This paper describes the modeling of the morphosyntactic annotations of the MULTEXT-East corpora and lexicons as an OWL/DL ontology. Formalizing annotation schemes in OWL/DL has the advantages of enabling formally specifying interrelationships between the various features and making logical inferences based on the relationships between them. We sho...
Article
Full-text available
The Open Linguistics Working Group (OWLG) is an initiative of experts from different fields concerned with linguistic data, including academic linguistics (e.g. typology, corpus linguistics), applied linguistics (e.g. computational linguistics, lexicography and language documentation), and NLP (e.g. from the Semantic Web community). The primary goa...
Article
Full-text available
This paper describes the creation of a re-source of German sentences with multi-ple automatically created alternative syn-tactic analyses (parses) for the same text, and how qualitative and quantitative inves-tigations of this resource can be performed using ANNIS, a tool for corpus querying and visualization. Using the example of PP attachment, we...
Conference Paper
Full-text available
This paper describes a series of experiments to test the hypothesis that the parallel application of multiple NLP tools and the integration of their results improves the correctness and robustness of the resulting analysis. It is shown how annotations created by seven NLP tools are mapped onto tool-independent descriptions that are defined with ref...