
Paloma Moreda- University of Alicante
Paloma Moreda
- University of Alicante
About
68
Publications
15,475
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
482
Citations
Introduction
Current institution
Publications
Publications (68)
Post-truth is a term that describes a distorting phenomenon that aims to manipulate public opinion and behavior. One of its key engines is the spread of Fake News. Nowadays most news is rapidly disseminated in written language via digital media and social networks. Therefore, to detect fake news it is becoming increasingly necessary to apply Artifi...
The extraction of domain terminology is a task that is increasingly used for different application processes of natural language such as
the information recovery, the creation of specialized corpus, question-answering systems, the creation of ontologies and the automatic
classification of documents. This task of the extraction of domain terminology...
The extraction of domain terminology is a task that is increasingly used for different application processes of natural language such as the information recovery, the creation of specialized corpus, question-answering systems, the creation of ontologies and the automatic classification of documents. This task of the extraction of domain terminology...
This paper presents a named entity classification system, which employs Random Forest machine learning algorithm. Our feature set includes local entity information and profiles, all of which is generated in an unsupervised manner. Performance on various languages (Spanish, Dutch and English) and domains (general and medical) demonstrate the flexibi...
This paper presents a named entity classification system, designed to be language independent. Our methodology employs Random Forest, a supervised machine learning algorithm, and its features
are generated in an unsupervised manner.
Our feature set includes local information from the entity and profiles (context information), without external knowl...
This paper presents a Named Entity Classification system, which employs machine learning. Our methodology employs local entity information and profiles as feature set. All features are generated in an unsupervised manner. It is tested on two different data sets: (i) DrugSemantics Spanish corpus (Overall F1 = 74.92), whose results are in-line with t...
Named Entity Recognition and Classification (NERC) is a prerequisite to other natural language processing applications. Nevertheless, the adaptation of NERC systems is expensive given that most of them only work appropiately on the domain for which they were created. Bearing this idea in mind, a named entity classification system, which is profile...
For the healthcare sector, it is critical to exploit the vast amount of textual health-related information. Nevertheless, healthcare providers have difficulties to benefit from such quantity of data during pharmacotherapeutic care. The problem is that such information is stored in different sources and their consultation time is limited. In this co...
¿Cómo y dónde concretamos la perspectiva de
género en nuestra docencia universitaria? y ¿cómo desarrollamos la perspectiva
de género con un enfoque transversal?
This paper presents a Named Entity Classification system, which uses profiles and machine learning. Aiming at confirming its domain independence, it is tested on two domains: general - CONLL2002 corpus, and medical - DrugSemantics gold standard. Given our overall results (CONLL2002, F1 = 67.06; DrugSemantics, F1 = 71.49), our methodology has proven...
The writing style used in social media usually contains informal elements that can lower the performance of Natural Language Processing applications. For this reason, text normalisation techniques have drawn a lot of attention recently when dealing with informal content. However, not all the texts present the same level of informality and may not r...
This paper describes an active ingredients named entity recogniser. Our machine learning system, which is language and domain independent, employs unsupervised feature generation and weighting from the training data. The proposed automatic feature extraction process is based on generating a profile for the given entity without traditional knowledge...
Nowadays Named Entity Recognition systems in the pharmacological domain, which are needed to help healthcare professional during pharmacological treatment prescription, suffer limitations related to the lack of coverage in official databases. Therefore, it seems necessary to analyse the reliability of existing resources, both in the Semantic Web an...
The purpose of this study is to design and develop an accessible web application, named Playlingua, to improve the language learning process and reading comprehension through gamification, by using Natural Language Processing (NLP) tools and techniques. We will analyze the advantages and positive effects of gamification in learning applications and...
This paper describes a medicinal products and active ingredients named entity recogniser (MaNER) for Spanish technical documents. This rule-based system uses high quality and low-maintenance lexicons. Our results (F-measure 90%) proves that dictionary-based approaches, without any deep natural language processing (e.g. POS tagging), can achieve a h...
Autism Spectrum Disorder (ASD) is a condition that impairs the proper development of people cognitive functions, social skills, and communicative abilities. A significant percentage of autistic people has inadequate reading comprehension skills. The European project FIRST is focused on developing a multilingual tool called Open Book that applies Hu...
In this paper we describe Fénix, a data model for exchanging information between Natural Language Processing applications. The format proposed is intended to be exible enough to cover both current and future data structures employed in the field of Computational Linguistics. The Fénix architecture is divided into four separate layers: conceptual, l...
User-generated content has become a recurrent resource for NLP tools and applications, hence many efforts have been made lately in order to handle the noise present in short social media texts. The use of normalisation techniques has been proven useful for identifying and replacing lexical variants on some of the most informal genres such as microb...
This introduction provides an overview of the state-of-the-art technology in Applications of Natural Language to Information Systems. Specifically, we analyze the need for such technologies to successfully address the new challenges of modern information systems, in which the exploitation of the Web as a main data source on business systems becomes...
This paper describes our participation in the profiling (polarity classification) task of the RepLab 2013 workshop. This task is focused on determining whether a given text from Twitter contains a positive or a negative statement related to the reputation of a given entity. We cover three different approaches, one unsupervised and two unsupervised....
The lexical richness and its ease of access to large volumes of information converts the Web 2.0 into an important resource for Natural Language Processing. Nevertheless, the frequent presence of non-normative linguistic phenomena that can make any automatic processing challenging. In this paper is described the participation in the Text Normalisat...
A basic task in opinion mining deals with determining the overall polarity orientation of a document about some topic. This has several applications such as detecting consumer opinions in on-line product reviews or increasing the effectiveness of social media marketing campaigns. However, the informal features of Web 2.0 texts can affect the perfor...
Tratamiento de la dimensión espacial en el texto y su aplicación a la recuperación de información Resumen: Proyecto emergente centrado en la desambiguación de topónimos y la detección del foco geográfico en el texto. La finalidad es mejorar el rendimiento de los sistemas de recuperación de información geográfica. Se describen los problemas abordado...
The lexical richness and its ease of access to large volumes of information converts the Web 2.0 into an important resource for Natural Language Processing. Nevertheless, the frequent presence of non-normative linguistic phenomena that can make any automatic processing challenging. We therefore propose in this study the normalisation of non-normati...
Abstract: Iarg-AnCora aims to annotate the implicit arguments of deverbal nominalizations in AnCora corpus. This corpus will be the basis for systems of automatic semantic role labeling based on machine learning techniques. Semantic analyzers are essential components in the current applications of language technologies, in which it is important to...
Iarg-AnCora aims to annotate the implicit arguments of deverbal nominalizations in AnCora corpus. This corpus will be the basis for systems of automatic semantic role labeling based on machine learning techniques. Semantic analyzers are essential components in the current applications of language technologies, in which it is important to obtain a d...
This project is focused on toponym disambiguation and geographical focus identification in text. The goal is to improve the performance of geographic information retrieval systems. This paper describes the problems faced, working hypothesis, tasks proposed and goals currently achieved. © 2012 Sociedad Española para el Procesamiento del Lenguaje Nat...
The language used in Web 2.0 applications such as blogging platforms, realtime chats, social networks or collaborative encyclopaedias shows remarkable differences in comparison with traditional texts. The presence of informal features such as emoticons, spelling errors or Internet-specific slang can lower the performance of Natural Language Process...
The Web 2.0, through its different platforms, such as blogs, social networks, microblogs, or forums allows users to freely write content on the Internet, with the purpose to provide, share and use information. However, the non-standard features of the language used in Web2.0 publications can make social media content less accessible than traditiona...
The data made available by Web 2.0 applications such as social networks, on-line chats or blogs have give access to multiples sources of information. Due to this dramatic increase in available information, the perception of quality and credibility plays an important role in social media, thus making necessary to discard low quality and uninterestin...
The study of the language used in Web 2.0 applications such as social networks, blogging platforms or on-line chats is a very interesting topic and can be used to test linguistic or social theories. However the existence of language deviations such as typos, emoticons, abuse of acronyms and domain-specific slang makes any linguistic analysis challe...
User-generated content (UGC) has transformed the way that information is handled on-line. In this paradigm shift, users create, share and consume textual information that is likely to present informal features such as poor formatting, misspellings, phonetic transliterations, slang or lexical variants (Ritter et. al., 2010). These texts found in soc...
As the Internet grows, it becomes essential to find efficient tools to deal with all the available information. Question answering (QA) and text summarization (TS) research fields focus on presenting the information requested by users in a more concise way. In this paper, the appropriateness and benefits of using summaries in semantic QA are analyz...
This paper presents two proposals based on semantic information, semantic roles and WordNet, for the answer extraction module of a general open-domain question answering (QA) system. The main objective of this research is to determine how the system performance is influenced by using this kind of information, and compare it with that of current QA...
We present the Dossier–GPLSI, a system for the automatic generation of press dossiers for organizations. News are downloaded
from online newspapers and are automatically classified. We describe specifically a module for the discrimination of person
names. Three different approaches are analyzed and evaluated, each one using different kind of inform...
El análisis de textos de la Web 2.0 es un tema de investigación relevante hoy en día. Sin embargo, son muchos los problemas que se plantean a la hora de utilizar las herramientas actuales en este tipo de textos. Para ser capaces de medir estas dificultades primero necesitamos conocer los diferentes registros o grados deinformalidad que podemos encon...
Social media publications are a popular and valuable source of information for Natural Language Processing applications. The linguistic
analysis of these texts is a challenging task as a consequence of their informal nature. This paper explores the characterization of
informality levels in Web 2.0 texts using unsupervised machine learning technique...
The study of text informality can provide us with valuable information for different NLP tasks. In the particular case of social media texts, their special characteristics like the presence of emoticons, slang or colloquial words can be used for obtaining additional information about their informality level. This pa-per demonstrates that the discov...
El análisis de textos de la Web 2.0 es un tema de investigación relevante hoy en día. Sin embargo, son muchos los problemas que se plantean a la hora de utilizar las herramientas actuales en este tipo de textos. Para ser capaces de medir estas dificultades primero necesitamos conocer los diferentes registros o grados de informalidad que podemos enco...
royecto emergente centrado en la detección e interpretación de metáforas con métodos no supervisados. Se presenta la caracterización del problema metafórico en Procesamiento del Lenguaje Natural, los fundamentos teóricos del proyecto y los primeros resultados.
Analysis of Web 2.0 texts is a relevant investigation topic nowadays. However, many problems arise when using state of the art tools in this kind of texts. For being able to measure these difficulties first we need to identify the different registers or informality levels that we can find. Therefore, in this paper we will attempt to characterize th...
Resumen Actualmente las redes sociales representan un mecanismo para que un conjunto de personas puedan potenciar su comunicación, cooperar entre ellas en tareas comunes y sentirse parte de una comunidad. Estas características hacen pensar que su uso sería conveniente en entornos educativos con el fin de potenciar diversos aspectos como: participac...
The contribution of semantic roles to question answering is considered to be very valuable. Due to this fact, the aim of this
paper is to analyze the influence of semantic roles in this area. In order to achieve this goal a web QA system has been implemented
using two different proposals for the answer extraction module based on semantic roles, and...
In recent years, improvements on automatic semantic role labeling have grown the interest of researchers in its application
to different NLP fields, specially to QA systems. We present a proposal of automatic generalization of the use of SR in QA
systems to extract answers for different types of questions. Firstly, we have implemented two different...
This paper shows the results of adapting a modular domain English QA system (called IBQAS, whose initials correspond to Interchangeable
Blocks Question Answering System) to work with both manual and automatic text transcriptions. This system provides a generic
and modular framework using an approach based on the recognition of named entities as a m...
Los conjuntos de preguntas utilizados normalmente para evaluar los sistemas de búsqueda de respuestas (BR) están principalmente constituidos por preguntas cuyas respuestas son entidades nombradas (NE), por tanto, la mayoría de estos sistemas usan reconocedores de entidades para extraer las posibles respuestas. Últimamente, el etiquetado de roles se...
In this paper, a method to determine the semantic role for the constituents of a sentence is presented. This method, named SemRol, is a corpus-based approach that uses two different statistical models, conditional Maximum Entropy (ME) Probability Models and the TiMBL program, a Memory-based Learning. It consists of three phases that make use of fea...
Apuntes de la asignatura Bases de Datos 1.
In this paper an exhaustive evaluation of the behavior of the most relevant features used in Semantic Role Disambiguation
tasks when the senses of the verbs are considered and when they are not, is presented. This evaluation analyzes the influence
of Verb Sense Disambiguation in the task. In order to do this, a whole system of Semantic Role Labelin...
It is well known that Information Retrieval Systems based entirely on syntactic contents have serious limitations. In order
to achieve high precision Information Retrieval Systems the incorporation of Natural Language Processing techniques that provide
semantic information is needed. For this reason, in this paper a method to determine the semantic...
In this paper a methodology to select one of the best set of features in Semantic Roles an-notation process based on Machine Learning method, is proposed. So, this paper will present how the selected set of features can be ap-plied on two different Machine Learning sys-tems, Maximum Entropy and TiMBL. The re-sults will show the importance of a feat...
In order to achieve high precision Question Answering Systems or Information Retrieval Systems, the incorporation of Natural
Language Processing techniques are needed. For this reason, in this paper a method to determine the semantic role for a constituent
is presented. The goal of this is to integrate the method in a Question Answering System and...
In order to achieve high precision Question Answering Systems or Information Retrieval Systems, the incorporation of Natural
Language Processing techniques are needed. For this reason, in this paper a method that can be integrated in these kinds of
systems, is presented. The aim of this method, based on maximum entropy conditional probability model...
In this paper, a supervised learning method of semantic role labeling is presented. It is based on maximum entropy conditional probability models. This method acquires the linguistic knowledge from an annotated corpus and this knowledge is represented in the form of
features. Several types of features have been analyzed for a few words selected fro...
Entidad financiera: MCyT (Proyecto PROFIT: FIT-150500-2002-411).
Database prototyping is a technique widely used both to validate user requirements and to verify certain application functionality.
These tasks usually require the population of the underlying data structures with sampling data that, additionally, may need
to stick to certain restrictions. Although some existing approaches have already automated th...
The access to the Society of the Information requires more and more resources and tools with more linguistic capabilities, and especially with semantic and conceptual capabilities. Following this line of explanation, this paper presents a proposal of conceptual representation based on linguistic technicals. From this point of view, this paper propo...
Resumen In this paper, the proposal and the method of annotation with semantic roles of 3LB corpus are presented. The semantic roles have been specified bearing in mind the application of the corpus to the development of Question Answering Systems. A semiautomatic method is followed with 3LB-SeRAT tool. En este trabajo se presenta la propuesta y mé...
Question sets normally used to evaluate QA systems are mainly based on questions whose answers are named entities, therefore most of these systems rely on NERs to extract possible answers. Nowadays, semantic role labeling and its contribution to question answering has recently become an interesting issue. Nevertheless, NE-based systems will always...
Libro de sesiones prácticas de la asignatura Bases de Datos 1.
La aportación de este trabajo se centra en el análisis o interpretación semántica, y más concretamente en el proceso de anotación de roles semánticos y su aplicación a otras tareas de PLN. Dicha aportación se puede resumir en tres objetivos principales: i) investigar en los conjuntos de roles semánticos y recursos lingüísticos definidos sobre ellos...