About
33
Publications
3,786
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
426
Citations
Publications
Publications (33)
Natural language processing techniques can be used to analyze the linguistic content of a document to extract missing pieces of metadata. However, accurate metadata extraction may not depend solely on the linguistics, but also on structural problems such as extremely large documents, unordered multi‐file documents, and inconsistency in manually lab...
Objectives: We survey recent work in biomedical NLP on building more adaptable or generalizable models, with a focus on work dealing with electronic health record (EHR) texts, to better understand recent trends in this area and identify opportunities for future research.
Methods: We searched PubMed, the Institute of Electrical and Electronics Engin...
Building clinical natural language processing (NLP) systems that work on widely varying data is an absolute necessity because of the expense of obtaining new training data. While domain adaptation research can have a positive impact on this problem, the most widely studied paradigms do not take into account the realities of clinical data sharing. T...
The National Environmental Policy Act (NEPA) provides a trove of data on how environmental policy decisions have been made in the United States over the last 50 years. Unfortunately, there is no central database for this information and it is too voluminous to assess manually. We describe our efforts to enable systematic research over US environmen...
This paper presents the first model for time normalization trained on the SCATE corpus. In the SCATE schema, time expressions are annotated as a semantic composition of time entities. This novel schema favors machine learning approaches, as it can be viewed as a semantic parsing task. In this work, we propose a character level multi-output neural n...
In this paper we present an approach to extract ordered timelines of events, their participants, locations and times from a set of Multilingual and Cross-lingual data sources. Based on the assumption that event-related information can be recovered from different documents written in different languages, we extend the Cross-document Event Ordering t...
In this article, we describe a system that . reads news articles in four different languages and detects what happened, who is involved, where and when. This event-centric information is represented as episodic situational knowledge on individuals in an interoperable RDF format that allows for reasoning on the implications of the events. Our system...
This paper presents a novel approach to improve the interoperability between four semantic resources that incorporate predicate information. Our proposal defines a set of automatic methods for mapping the semantic knowledge included in WordNet, VerbNet, PropBank and FrameNet. We use advanced graph-based word sense disambiguation algorithms and corp...
We describe a novel modular system for cross-lingual event extraction for English, Spanish,, Dutch
and Italian texts. The system consists of a ready-to-use modular set of advanced multilingual Natural
Language Processing (NLP) tools. The pipeline integrates modules for basic NLP processing as
well as more advanced tasks such as cross-lingual Named...
This paper investigates the contribution of document level processing of time-anchors for TimeLine event extraction. We developed and tested two different sys-tems. The first one is a baseline system that captures explicit time-anchors. The second one extends the baseline system by also capturing implicit time relations. We have evaluated both appr...
The European project NewsReader develops advanced technology to process daily news streams in 4 languages, extracting what happened, when and where it happened and who was involved. NewsReader reads massive amounts of news coming from thousands of sources. It compares the results across sources to complement information and determine where the diff...
This paper presents the first steps towards building the Predicate Matrix, a new lexical resource resulting from the integration of multiple sources of predicate information including FrameNet (Baker et al., 1997), VerbNet (Kipper, 2005), PropBank (Palmer et al., 2005) and WordNet (Fellbaum, 1998). By using the Predicate Matrix, we expect to provid...
This paper presents a novel deterministic algorithm for implicit Semantic Role Labeling. The system exploits a very simple but relevant discursive property, the argument coherence over different instances of a predicate. The algorithm solves the implicit arguments sequentially, exploiting not only explicit but also the implicit arguments previously...
Following the frame semantics paradigm, we present a novel strategy for solving null-instantiated arguments. Our method learns probability distributions of semantic types for each Frame Element from explicit corpus annotations. These distributions are used to select the most probable missing implicit arguments together with its most probable filler...
This paper presents a novel automatic approach to partially integrate FrameNet and WordNet. In that way we expect to extend FrameNet coverage, to enrich WordNet with frame semantic information and possibly to extend FrameNet to languages other than English. The method uses a knowledge-based Word Sense Disambiguation algorithm for matching the Frame...
This paper presents a novel automatic approach to partially integrate FrameNet and WordNet. In that way we expect to extend FrameNet coverage, to enrich WordNet with frame semantic information and possibly to extend FrameNet to languages other than English. The method uses a knowledge-based Word Sense Disambiguation algorithm for match- ing the Fra...
Abstract This paper presents a novel automatic approach,to partially integrate FrameNet and WordNet. In that way we expect to extend FrameNet coverage, to en- rich WordNet with frame semantic,information and possibly to extend FrameNet to languages other than English. The method,uses a knowledge-based,Word Sense Disambiguation algorithm for linking...
Resumen: El proyecto Kyoto construye un sistema de información independiente del lenguaje para un dominio específico (medio ambiente, ecología y diversidad) basado en una ontología independiente del lenguaje que estará enlazada a Wordnets en siete idiomas. Palabras clave: Extracción de Información Abstract: The KYOTO project will construct a langua...
This paper presents the complete and consistent ont ological annotation of the nominal part of WordNet. The annotation has been carried out using the semantic features defined in the EuroWordNet Top Concept Ontology and made available to the NLP community. Up to now only an initial core set of 1,024 synsets , the so-called Base Concepts, was ontolo...
This paper describes the connection of WordNet to a generic ontology based on DOLCE. We developed a complete set of heuristics for mapping all WordNet nouns, verbs and adjectives to the ontology. Moreover, the mapping also allows to represent predicates in a uniform and interoperable way, regardless of the way they are expressed in the text and in...