Johannes DellertUniversity of Tuebingen | EKU Tübingen · Department of Linguistics
Johannes Dellert
Doctor of Philosophy
About
19
Publications
1,747
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
179
Citations
Introduction
Publications
Publications (19)
Adding to the budding landscape of advanced analysis tools for Probabilistic Soft Logic (PSL), we present a graphical explorer for grounded PSL models. It exposes the structure of the model from the perspective of any single atom, listing the ground rules in which it occurs. The other atoms in these rules serve as links for navigation through the r...
Integrating datasets from different disciplines is hard because the data are often qualitatively different in meaning, scale and reliability. When two datasets describe the same entities, many scientific questions can be phrased around whether the (dis)similarities between entities are conserved across such different data. Our method, CLARITY, quan...
In speech, the connection between sounds and word meanings is mostly arbitrary. However, among basic concepts of the vocabulary, several words can be shown to exhibit some degree of form–meaning resemblance, a feature labelled vocal iconicity. Vocal iconicity plays a role in first language acquisition and was likely prominent also in pre-historic l...
In the Proto-Quechuan lexicon, many two-segment phonetic substrings recur in semantically related roots, even though they are not independent morphemes. Such elements may have been morphemes before the Proto-Quechuan stage (i.e., in Pre-Proto-Quechuan). On the other hand, this may simply be due to chance, or to phonesthesia. In this paper, we intro...
Integrating datasets from different disciplines is hard because the data are often qualitatively different in meaning, scale, and reliability. When two datasets describe the same entities, many scientific questions can be phrased around whether the similarities between entities are conserved. Our method, CLARITY, quantifies consistency across datas...
This technical report describes a new prototype architecture designed to integrate top-down and bottom-up analysis of non-standard linguistic input, where a semantic model of the context of an utterance is used to guide the analysis of the non-standard surface forms, including their automated normalization in context. While the architecture is gene...
This article describes the first release version of a new lexicostatistical database of Northern Eurasia, which includes Europe as the most well-researched linguistic area. Unlike in other areas of the world, where databases are restricted to covering a small number of concepts as far as possible based on often sparse documentation, good lexical re...
Based on a recently published large-scale lexicostatistical database, we rank 1,016 concepts by their suitability for inclusion in Swadesh-style lists of basic stable concepts. For this, we define separate measures of basicness and stability. Basicness in the sense of morphological simplicity is measured based on information content, a generalizati...
This paper presents a large comparative lexical database which covers about a thousand concepts across twenty Uralic languages. The dataset will be released
as the first part of NorthEuraLex, a lexicostatistical database of Northern Eurasia which is being compiled within the EVOLAEMP project.
The chief purpose of the lexical database is to serve as...
In this paper we present a parsing architecture that allows processing of different mildly context-sensitive formalisms, in particular Tree-Adjoining Grammar (TAG), Multi-Component Tree-Adjoining Grammar with Tree Tuples (TT-MCTAG) and "simple" Range Concatenation Grammar (RCG). Furthermore, for tree-based grammars, the parser computes not only syn...
We present a very large network of crosslinguistic polysemies, and compare the notion of semantic relatedness it encodes to the catalogue of semantic shifts maintained by the Russian Academy of Sciences. We separately evaluate all types of semantic shifts featured in the catalogue, including shifts occurring during semantic evolution, during borrow...
Existing algorithms for minimal unsatisfiable subset (MUS) extraction are defined independently of any symbolic information, and in current implementations domain experts mostly do not have a chance to influence the extraction process based on their knowledge about the encoded problem. The MUStICCa tool introduces a novel graphical user interface f...
TuLiPA - Parsing extensions of TAG with range concatenation grammars
In this paper we present a parsing framework for extensions of Tree Adjoining Grammar (TAG) called TuLiPA (Tübingen Linguistic Parsing Architecture). In particular, besides TAG, the parser can process Tree-Tuple MCTAG with Shared Nodes (TT-MCTAG), a TAG-extension which has been pr...
In this paper, we present an open-source parsing environment (Tuebingen Linguistic Parsing Architecture, TuLiPA) which uses Range Concatenation Grammar (RCG) as a pivot formalism, thus opening the way to the parsing of several mildly context-sensitive formalisms. This environment currently supports tree-based grammars (namely Tree-Adjoining Grammar...
Developing linguistic resources, in particular grammars, is known to be a complex task in itself, because of (amongst others) redundancy and consistency issues. Furthermore some languages can reveal themselves hard to describe because of specific characteristics, e.g. the free word order in German. In this context, we present (i) a framework allowi...
We present an approach for querying collections of heterogeneous linguistic corpora that are annotated on multiple layers using arbitrary XML-based markup languages. An OWL ontology provides a homogenising view on the conceptually different markup languages so that a common querying framework can be established using the method of ontology-based qu...
Proceedings of the Sixth International Workshop on Treebanks and Linguistic Theories. Editors: Koenraad De Smedt, Jan Hajič and Sandra Kübler. NEALT Proceedings Series, Vol. 1 (2007), 127-138. © 2007 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically p...