Nils Diewald

Nils Diewald
Institute for the German Language

About

22
Publications
3,715
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
101
Citations

Publications

Publications (22)
Chapter
KorAP, die neue Korpusanalyseplattform des IDS, die COSMAS II im Laufe der kommenden 2-3 Jahre ablosen wird, bietet gerade zur Erforschung grammatischer Variation einige besondere Funktionalitäten. Grundlegend ist beispielsweise, dass KorAP die Repräsentation und Abfrage beliebiger und beliebig vieler Annotationsschichten, zum Beispiel zu Konstitue...
Conference Paper
Full-text available
KorAP is a corpus search and analysis platform, developed at the Institute for the German Language (IDS). It supports very large corpora with multiple annotation layers, multiple query languages, and complex licensing scenarios. KorAP's design aims to be scalable, flexible, and sustainable to serve the German Reference Corpus DEREKO for at least th...
Chapter
As the previous chapters have shown, the possible ways of representing linguistic data as a graph are as diverse as the data itself. For the process of graph modeling, the decision as to what information will be represented as nodes and what information as relations is of great importance. In addition, what kind of added value is going to be expect...
Conference Paper
Full-text available
The task-oriented and format-driven development of corpus query systems has led to the creation of numerous corpus query languages (QLs) that vary strongly in expressiveness and syntax. This is a severe impediment for the interoperability of corpus analysis systems, which lack a common protocol. In this paper, we present KoralQuery, a JSON-LD based...
Conference Paper
Full-text available
We present an approach to an aspect of managing complex access scenarios to large and heterogeneous corpora that involves handling user queries that, intentionally or due to the complexity of the queried resource, target texts or annotations outside of the given user's permissions. We first outline the overall architecture of the corpus analysis pl...
Conference Paper
Full-text available
Both XML namespaces and standoff annotation are promising approaches to tackle possibly overlapping multiple annotation layers in XML instances. The creation and processing of standoff instances can be cumbersome – especially when the underlying textual primary data is allowed to be modified after the annotation has been added. In this paper we pre...
Chapter
Unser Beitrag untersucht am Beispiel des GuttenPlag Wikisdie Interdependenz und Dynamik zwischen kollaborativen und interaktiven Handlungen von Akteuren im World Wide Web. Das Beispiel eignet sich besonders, um einerseits die spezifische Dynamik eines Kollaborationsnetzwerks im World Wide Web hinsichtlich der dabei sichtbar werdenden Partizipations...
Chapter
Full-text available
Modern NLP systems rely either on unsupervised methods, or on data created as part of governmental initiatives such as MUC, ACE, or GALE. The data created in these efforts tend to be annotated according to task-specific schemes. The Anaphoric Bank is an attempt to create large quantities of data annotated with anaphoric information according to a g...
Article
Full-text available
In this article, we test a variant of the Sapir-Whorf Hypothesis in the area of complex network theory. This is done by analyzing social ontologies as a new resource for automatic language classification. Our method is to solely explore structural features of social ontologies in order to predict family resemblances of languages used by the corresp...
Article
Full-text available
In this paper, the authors induce linguistic networks as a prerequisite for detecting language change by means of the Patrologia Latina, a corpus of Latin texts from the 4th to the 13th century.
Technical Report
Full-text available
Article
Full-text available
Der Artikel stellt zum einen ein Annotationsschema für semantische Re lationen vor, das für die Beschreibung eines deutschsprachigen Korpus für Training und Evaluation eines Systems zur Anaphernauflösung entwickelt wurde, zum anderen wird das webbasierte Annotationstool Serengeti be schrieben, das zur Annotation anaphorischer Relationen im Projek...
Article
Full-text available
Annotating large text corpora is a time-consuming effort. Although single-user annotation tools are available, web-based annotation applications allow for distributed annotation and file access from different locations. In this paper we present the web-based annotation application Serengeti for annotating anaphoric relations which will be extended...
Technical Report
Full-text available

Network

Cited By

Projects

Projects (2)