About
44
Publications
5,387
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
182
Citations
Introduction
Current institution
Publications
Publications (44)
This paper introduces the current state of the FREME framework. The paper puts FREME into the context of linguistic linked data and related approaches of multilingual and semantic processing. In addition, we focus on two specific aspects of FREME: the FREME NER e-Service, and chaining of FREME e-Services. We believe that the flexible and distribute...
With the ever increasing availability of linked multilingual lexical resources, there is a renewed interest in extending Natural Language Processing (NLP) applications so that they can make use of the vast set of lexical knowledge bases available in the Semantic Web. In the case of Machine Translation, MT systems can potentially benefit from such a...
In an attempt to put a Semantic Web-layer that provides linguistic analysis and discourse information on top of digital content, we develop a platform for digital curation technologies. The platform offers language-, knowledge- and data-aware services as a flexible set of workflows and pipelines for the efficient processing of various types of digi...
Der Beitrag beleuchtet im Kontext mehrsprachiger semantischer Anwendungen die Rolle ausgewählter Technologien und Standards. Standardisierte semantische Ressourcen und standardisierte Verfahren für ihre Nutzung in sprachtechnologischen Anwendungen und Workflows besitzen das Potential, die Qualität der Anwendungen entscheidend zu verbessern und den...
As many software applications have moved from a desktop software deployment model to a Software-as-a-Service (SaaS) model so we have seen tool vendors in the language service industry move to a SaaS model, e.g., for web-based Computer Assisted Translation (CAT) tools. However, many of these offerings fail to take full advantage of the Open Web Plat...
We report on the MultilingualWeb initiative, a collaboration between the W3C Internationalization Activity and the European Commission, realized as a series of EC-funded projects. We review the outcomes of "MultilingualWeb", which conducted 4 workshops analyzing "gaps" within Web standardization that currently hinder multilinguality. Gap analysis l...
We have developed DBpedia Spotlight, a flexible concept tagging system that is able to tag – i.e. annotate – entities, topics and other
terms in natural language text. The system starts by recognizing phrases to annotate in the input text, and subsequently disambiguates
them to a reference knowledge base extracted from Wikipedia. In this paper we e...
In der europäischen Informationsgesellschaft des 21. Jahrhunderts sollte niemand einen sozialen oder wirtschaftlichen Nachteil dadurch erleiden, lediglich seine Muttersprache zu sprechen – sei es Lettisch, Ungarisch oder Portugiesisch. Sprachtechnologie hat das Potential dieser Herausforderung entgegen zu treten, sofern sie robust und kosteneffekti...
In der europäischen Informationsgesellschaft des 21. Jahrhunderts sollte niemand einen sozialen oder wirtschaftlichen Nachteil dadurch erleiden, lediglich seine Muttersprache zu sprechen – sei es Lettisch, Ungarisch oder Portugiesisch. Sprachtechnologie hat das Potential dieser Herausforderung entgegen zu treten, sofern sie robust und kosteneffekti...
The success of the markup language XML is partly due to its internationalization capabilities. “internationalization” means
the readiness of a product or a technology for an international market and users with different languages, cultures and cultural
preferences. The aim of the paper is threefold. First, it introduces aspects of internationalizat...
XML has many built-in capabilities to support the worldwide use of content. Proper use of these capabilities for the purpose of internationalization (i18n) and localization (l10n), however, sometimes requires considerable expertise. This holds especially for developers of XML schemas, and producers of XML instances (such as authors or translators)....
This paper takes a "real world scenario" of re-using XML in different contexts, with different people participating in different places over time. Various approaches for re-usage like architectural forms or (possibly RDF-based) markup semantics are discussed. The proposed methodology for re-usage is based on these approaches, but also - somehow sur...
An abstract is not available.
An approach to the unification of XML (Extensible Markup Language) documents with identical textual content and concurrent markup in the framework of XML-based multi-layer annotation is introduced. A Prolog program allows the possible relationships between element instances on two annotation layers that share PCDATA to be explored and also the comp...
Secondary Information Structuring is the key part of an architecture, which is used for the knowledge-based, vertical interrelation of information resources: Primary Information Structuring (document grammars, marked-up instance documents) and abstract, conceptual resources (conceptual models, ontologies). Secondary Information Structuring encompas...
This paper deals with the problem of how to interrelate theory-specific treebanks and how to transform one treebank format to another. Currently, two approaches to achieve these goals can be differentiated. The first creates a mapping algorithm between treebank formats [18]. Categories of a source format are transformed into a target format via a g...
Abstract This paper describes a corpus of Japanese task-oriented dialogues, i.e. its data, annotations, analysis methodology and preliminary results for the modeling,of co-referential phenomena.,Current corpus-based approaches,to co-reference concentrate on textual data from English or other European languages. Hence, the emerging language-general...
Schema languages concentrate on grammatical constraints on document structures, i.e. hierarchical relations between elements in a tree-like structure. In this paper, we complement this concept with a methodology for defining and applying structural constraints from the perspective of a single element. These constraints can be used in addition to th...
The extraction of lexical information for machine readable lexica from multilevel annotations is addressed in this paper. Relations between these levels of annotation are used for sub- classification of lexical entries. A method for relating annota- tion units is presented, based on a temporal calculus. Relating the annotation units manually is err...
This article introduces a dialogue corpus containing data from two typologically different languages, Japanese and Kilivila. The corpus is annotated in accordance with language specific annotation schemes for co-referential and similar relations. The article describes the corpus data, the properties of language specific co-reference in the two lang...
With MetaLex we introduce a framework for metadata management where information can be inferred from different areas of metadata coding, such as metadata for catalogue descriptions, linguistic levels, or tiers. This is done for consistency and efficiency in metadata recording and applies the same inference techniques that are used for lexical infer...
This paper introduces ongoing and current work within Internationalization (i18n) Activity, in the World Wide Web Consortium (W3C). The focus is on aspects of the W3C i18n Activity which are of benefit for the creation and manipulation of multilingual language resources. In particular, the paper deals with ongoing work concerning encoding, visualiz...
This paper proposes a methodology for querying linguistic data represented in different corpus formats. Examples of the need for queries over such heterogeneous resources are the corpus-based analysis of multimodal phenomena like the interaction of gestures and prosodic features, or syntax-related phenomena like information structure which exceed t...
Introduction A common problem in the design of document grammars is to define a content model which is not too specific, but also not too general. Within the framework of DTDs, the user is able to specify the content model of an element like this: This declaration is general enough to annotate word structure in languages with a agglutination system...
Many XML-related activities (e.g. the creation of a new schema) already address issues with different languages, scripts, and cultures. Nevertheless, a need exists for additional mechanisms and guidelines for more effective internationalization (i18n) and localization (l10n) in XML-related contents and processes. The W3C Internationalization Tag Se...
Zusammenfassung Beschrieben wird ein System, welches japanischsprachige Korpora durch die Anrei- cherung mit deutschsprachigen Informationen (Einzelwortbersetzungen, morpho- logische Kategorien und Lateinumschrift) fr die Ñwestlicheì sprachbezogene For- schung erschlieflt. Zum Einsatz kommenber das WWW frei verfgbare Lexika. Eine erste Anwendung de...
1 Einleitung Der vorliegende Artikel thematisiert Analyse-und Modellierungsmöglichkeiten für Informationseinheiten 1 in texttechnologischen Korpora. Vorgestellt wird eine Methode, Informationseinheiten anhand ihrer strukturell-positionellen Eigenschaften zu bestimmen. Im Zentrum steht die Frage, wie "strukturelle Eigenschaften" zu definieren und zu...
We have developed DBpedia Spotlight, a flexible concept tagging system that is able to tag – i.e. annotate – entities, topics and other terms in natural language text. The system starts by recognizing phrases to annotate in the input text, and subsequently disambiguates them to a reference knowledge base extracted from Wikipedia. In this paper we e...
Ausgangspunkt der Arbeit ist die Methodologie der texttechnologischen Informationsmodellierung, die standardisierte Formate zur Modellierung von informationellen Ressourcen nutzt. Texte - als ein Beispiel einer informationellen Ressource - lassen sich auf verschiedenen, zumeist in hierarchischen Beziehungen zueinander stehenden Ebenen mit Informati...