Felix Sasaki

Felix Sasaki
  • Senior Researcher at Deutsches Forschungszentrum für Künstliche Intelligenz

About

44
Publications
5,387
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
182
Citations
Current institution
Deutsches Forschungszentrum für Künstliche Intelligenz
Current position
  • Senior Researcher

Publications

Publications (44)
Conference Paper
This paper introduces the current state of the FREME framework. The paper puts FREME into the context of linguistic linked data and related approaches of multilingual and semantic processing. In addition, we focus on two specific aspects of FREME: the FREME NER e-Service, and chaining of FREME e-Services. We believe that the flexible and distribute...
Article
Full-text available
With the ever increasing availability of linked multilingual lexical resources, there is a renewed interest in extending Natural Language Processing (NLP) applications so that they can make use of the vast set of lexical knowledge bases available in the Semantic Web. In the case of Machine Translation, MT systems can potentially benefit from such a...
Conference Paper
In an attempt to put a Semantic Web-layer that provides linguistic analysis and discourse information on top of digital content, we develop a platform for digital curation technologies. The platform offers language-, knowledge- and data-aware services as a flexible set of workflows and pipelines for the efficient processing of various types of digi...
Chapter
Der Beitrag beleuchtet im Kontext mehrsprachiger semantischer Anwendungen die Rolle ausgewählter Technologien und Standards. Standardisierte semantische Ressourcen und standardisierte Verfahren für ihre Nutzung in sprachtechnologischen Anwendungen und Workflows besitzen das Potential, die Qualität der Anwendungen entscheidend zu verbessern und den...
Article
As many software applications have moved from a desktop software deployment model to a Software-as-a-Service (SaaS) model so we have seen tool vendors in the language service industry move to a SaaS model, e.g., for web-based Computer Assisted Translation (CAT) tools. However, many of these offerings fail to take full advantage of the Open Web Plat...
Article
Full-text available
We report on the MultilingualWeb initiative, a collaboration between the W3C Internationalization Activity and the European Commission, realized as a series of EC-funded projects. We review the outcomes of "MultilingualWeb", which conducted 4 workshops analyzing "gaps" within Web standardization that currently hinder multilinguality. Gap analysis l...
Conference Paper
We have developed DBpedia Spotlight, a flexible concept tagging system that is able to tag – i.e. annotate – entities, topics and other terms in natural language text. The system starts by recognizing phrases to annotate in the input text, and subsequently disambiguates them to a reference knowledge base extracted from Wikipedia. In this paper we e...
Article
In der europäischen Informationsgesellschaft des 21. Jahrhunderts sollte niemand einen sozialen oder wirtschaftlichen Nachteil dadurch erleiden, lediglich seine Muttersprache zu sprechen – sei es Lettisch, Ungarisch oder Portugiesisch. Sprachtechnologie hat das Potential dieser Herausforderung entgegen zu treten, sofern sie robust und kosteneffekti...
Article
Full-text available
In der europäischen Informationsgesellschaft des 21. Jahrhunderts sollte niemand einen sozialen oder wirtschaftlichen Nachteil dadurch erleiden, lediglich seine Muttersprache zu sprechen – sei es Lettisch, Ungarisch oder Portugiesisch. Sprachtechnologie hat das Potential dieser Herausforderung entgegen zu treten, sofern sie robust und kosteneffekti...
Chapter
The success of the markup language XML is partly due to its internationalization capabilities. “internationalization” means the readiness of a product or a technology for an international market and users with different languages, cultures and cultural preferences. The aim of the paper is threefold. First, it introduces aspects of internationalizat...
Article
Full-text available
XML has many built-in capabilities to support the worldwide use of content. Proper use of these capabilities for the purpose of internationalization (i18n) and localization (l10n), however, sometimes requires considerable expertise. This holds especially for developers of XML schemas, and producers of XML instances (such as authors or translators)....
Article
This paper takes a "real world scenario" of re-using XML in different contexts, with different people participating in different places over time. Various approaches for re-usage like architectural forms or (possibly RDF-based) markup semantics are discussed. The proposed methodology for re-usage is based on these approaches, but also - somehow sur...
Article
Full-text available
An abstract is not available.
Article
An approach to the unification of XML (Extensible Markup Language) documents with identical textual content and concurrent markup in the framework of XML-based multi-layer annotation is introduced. A Prolog program allows the possible relationships between element instances on two annotation layers that share PCDATA to be explored and also the comp...
Conference Paper
Secondary Information Structuring is the key part of an architecture, which is used for the knowledge-based, vertical interrelation of information resources: Primary Information Structuring (document grammars, marked-up instance documents) and abstract, conceptual resources (conceptual models, ontologies). Secondary Information Structuring encompas...
Article
Full-text available
This paper deals with the problem of how to interrelate theory-specific treebanks and how to transform one treebank format to another. Currently, two approaches to achieve these goals can be differentiated. The first creates a mapping algorithm between treebank formats [18]. Categories of a source format are transformed into a target format via a g...
Article
Abstract This paper describes a corpus of Japanese task-oriented dialogues, i.e. its data, annotations, analysis methodology and preliminary results for the modeling,of co-referential phenomena.,Current corpus-based approaches,to co-reference concentrate on textual data from English or other European languages. Hence, the emerging language-general...
Article
Full-text available
Schema languages concentrate on grammatical constraints on document structures, i.e. hierarchical relations between elements in a tree-like structure. In this paper, we complement this concept with a methodology for defining and applying structural constraints from the perspective of a single element. These constraints can be used in addition to th...
Conference Paper
Full-text available
The extraction of lexical information for machine readable lexica from multilevel annotations is addressed in this paper. Relations between these levels of annotation are used for sub- classification of lexical entries. A method for relating annota- tion units is presented, based on a temporal calculus. Relating the annotation units manually is err...
Article
Full-text available
This article introduces a dialogue corpus containing data from two typologically different languages, Japanese and Kilivila. The corpus is annotated in accordance with language specific annotation schemes for co-referential and similar relations. The article describes the corpus data, the properties of language specific co-reference in the two lang...
Article
Full-text available
With MetaLex we introduce a framework for metadata management where information can be inferred from different areas of metadata coding, such as metadata for catalogue descriptions, linguistic levels, or tiers. This is done for consistency and efficiency in metadata recording and applies the same inference techniques that are used for lexical infer...
Article
Full-text available
This paper introduces ongoing and current work within Internationalization (i18n) Activity, in the World Wide Web Consortium (W3C). The focus is on aspects of the W3C i18n Activity which are of benefit for the creation and manipulation of multilingual language resources. In particular, the paper deals with ongoing work concerning encoding, visualiz...
Article
Full-text available
This paper proposes a methodology for querying linguistic data represented in different corpus formats. Examples of the need for queries over such heterogeneous resources are the corpus-based analysis of multimodal phenomena like the interaction of gestures and prosodic features, or syntax-related phenomena like information structure which exceed t...
Article
Full-text available
Introduction A common problem in the design of document grammars is to define a content model which is not too specific, but also not too general. Within the framework of DTDs, the user is able to specify the content model of an element like this: This declaration is general enough to annotate word structure in languages with a agglutination system...
Article
Many XML-related activities (e.g. the creation of a new schema) already address issues with different languages, scripts, and cultures. Nevertheless, a need exists for additional mechanisms and guidelines for more effective internationalization (i18n) and localization (l10n) in XML-related contents and processes. The W3C Internationalization Tag Se...
Article
Zusammenfassung Beschrieben wird ein System, welches japanischsprachige Korpora durch die Anrei- cherung mit deutschsprachigen Informationen (Einzelwortbersetzungen, morpho- logische Kategorien und Lateinumschrift) fr die Ñwestlicheì sprachbezogene For- schung erschlieflt. Zum Einsatz kommenber das WWW frei verfgbare Lexika. Eine erste Anwendung de...
Article
Full-text available
1 Einleitung Der vorliegende Artikel thematisiert Analyse-und Modellierungsmöglichkeiten für Informationseinheiten 1 in texttechnologischen Korpora. Vorgestellt wird eine Methode, Informationseinheiten anhand ihrer strukturell-positionellen Eigenschaften zu bestimmen. Im Zentrum steht die Frage, wie "strukturelle Eigenschaften" zu definieren und zu...
Article
Full-text available
We have developed DBpedia Spotlight, a flexible concept tagging system that is able to tag – i.e. annotate – entities, topics and other terms in natural language text. The system starts by recognizing phrases to annotate in the input text, and subsequently disambiguates them to a reference knowledge base extracted from Wikipedia. In this paper we e...
Article
Ausgangspunkt der Arbeit ist die Methodologie der texttechnologischen Informationsmodellierung, die standardisierte Formate zur Modellierung von informationellen Ressourcen nutzt. Texte - als ein Beispiel einer informationellen Ressource - lassen sich auf verschiedenen, zumeist in hierarchischen Beziehungen zueinander stehenden Ebenen mit Informati...

Network

Cited By