Ana M Garcia-SerranoNational Distance Education University | UNED · Department of Computer Languages and Systems
Ana M Garcia-Serrano
PhD on Computer Science
About
169
Publications
23,523
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,116
Citations
Introduction
Ana M Garcia-Serrano currently works at the Department of Computer Science, UNED (Universidad Nacional de Educación a Distancia). Ana does research in Human-computer Interaction, Natural Language Processing, Dgital Humanities and Artificial Intelligence. Her current projects are Musacces (Digital Humanities) and 'New ontology-based semantic similarity measures, Information Content models and ontology-based IR models' (NLP & Information Retrieval).
Additional affiliations
July 2007 - present
September 1983 - June 2007
Publications
Publications (169)
This work describes the language resources and models developed for automatic simplification of Spanish texts in three domains: Finance, Medicine and History studies. We created several corpora in each domain, annotation and simplification guidelines, a lexicon of technical and simplified medical terms, datasets used in shared tasks for the financi...
The task of Automatic Text Simplification (ATS) aims to transform texts to improve their readability and comprehensibility. Current solutions are based on Large Language Models (LLM). These models have high performance but require powerful computing resources and large amounts of data to be fine-tuned when working in specific and technical domains....
In this paper, we present the results of a research experience of implementing andragogy in a learning environment designed to better meet the needs of adult learners studying part-time at a distance university. The learning environment was composed of a learning experience on a formal distance university online course that has been enriched with a...
Resumen: Cuando en Humanidades Digitales (HD) se dispone de material digital en un proyecto de investigación o profesional, es necesario identificar las herramientas más prometedoras para alcanzar los objetivos planteados. En este trabajo, aportamos algunas orientaciones metodológicas para conocer las técnicas y la tecnología más innovadora en el c...
Resumen Analizar periódicos de los siglos XVIII, XIX y principios del XX exige cierta calidad de las fuentes digitalizadas y la utilización de recursos específicos de dominio o de la lengua. Cualquier aproximación utilizando las tecnologías actuales, se encuentra con que la mayoría de los modelos PLN disponibles para la transcripción o el reconocim...
This registered report introduces the largest, and for the first time, reproducible experimental survey on biomedical sentence similarity with the following aims: (1) to elucidate the state of the art of the problem; (2) to solve some reproducibility problems preventing the evaluation of most current methods; (3) to evaluate several unexplored sent...
Este artículo se centra en el análisis de dos investigaciones de diverso signo guiadas por la inteligencia artificial dentro del campo de las HD. El primero es una investigación muy conocida y exitosa de dos lingüistas que resuelven un caso de atribución de autoría a través de la construcción de un corpus digital de 150 obras de 40 novelistas itali...
This registered report introduces the largest, and for the first time, reproducible experimental survey on biomedical sentence similarity with the following aims: (1) to elucidate the state of the art of the problem; (2) to solve some reproducibility problems preventing the evaluation of most of current methods; (3) to evaluate several unexplored s...
Slides introducing the corresponding paper
This protocol introduces a set of reproducibility resources with the aim of allowing the exact replication of the experiments introduced by our main paper [1], which introduces the largest and for the first time reproducible experimental survey on biomedical sentence similarity. HESML V2R1 [2] is the sixth release of our Half-Edge Semantic Measures...
Background
Ontology-based semantic similarity measures based on SNOMED-CT, MeSH, and Gene Ontology are being extensively used in many applications in biomedical text mining and genomics respectively, which has encouraged the development of semantic measures libraries based on the aforementioned ontologies. However, current state-of-the-art semantic...
In the authorship verification task (PAN at CLEF 2021) the main aim is to discriminate between pairs of texts written by the same author or by two different authors. Our work focuses on extracting two stylometric features, character-level n-grams and the use of punctuation marks in the texts. Subsequently, we train a neural network with each of the...
Measuring semantic similarity between sentences is a significant task in the fields of Natural Language Processing (NLP), Information Retrieval (IR), and biomedical text mining. For this reason, the proposal of sentence similarity methods for the biomedical domain has attracted a lot of attention in recent years. However, most sentence similarity m...
Unsupervised Named Entity Recognition (NER) approaches do not depend on labelled data to function properly but rather on a source of knowledge, in which promising candidates can be looked up to find the corresponding concept. In the biomedical domain knowledge source like this already exists; namely the Unified Medical Language System (UMLS). In th...
This work is a companion reproducibility paper of the experiments and results reported in Lastra-Diaz et al. (2019a), which is based on the evaluation of a companion reproducibility dataset with the HESML V1R4 library and the long-term reproducibility tool called Reprozip. Human similarity and relatedness judgements between concepts underlie most o...
There are some similarities in developing distance education online courses and Massive Open Online Courses (MOOCs) using the basis of eLearning instructional design. However, the task of converting an online course into a MOOC is not as simple as direct migration of eLearning materials and assessment resources into a MOOC platform. In online learn...
System and method for the indexing and retrieval of semantically annotated information units from a collection of semantically annotated indexed information units in response to a query using an ontology-based IR model. The retrieval method comprises: receiving a semantically annotated query with semantic annotations to individuals or classes withi...
People with disabilities still face the problem of limitations while accessing to and enjoying artistic heritage. The work developed within the MUSACCES project aims to improve the relation of people with disabilities with the artistic heritage in the Prado museum trying to achieve a multisensorial user experience discovering formative narratives a...
Human similarity and relatedness judgements between concepts underlie most of cognitive capabilities, such as categorisation, memory, decision-making and reasoning. For this reason, the proposal of methods for the estimation of the degree of similarity and relatedness between words and concepts has been a very active line of research in the fields...
This data article introduces a reproducibility dataset with the aim of allowing the exact replication of all experiments, results and data tables introduced in our companion paper (Lastra-Díaz et al., 2019), which introduces the largest experimental survey on ontology-based semantic similarity methods and Word Embeddings (WE) for word similarity re...
In social image retrieval, the main goal is to offer a relevant but also diverse result set of images to the user. To address relevance and diversity at the same time, we propose a multi-modal procedure. This approach deals with the diversification problem using a two-step procedure based on the application of Formal Concept Analysis (FCA)to organi...
Musacces project description, some comments on the state of the art and main objectives.
Resumen: Hoy en día el idioma ha dejado de ser una barrera para plagiar documentos disponibles en Internet. Tras enfoques probabilísticos ya clásicos que no alcanzan buenos resultados con documentos multilingües con paráfrasis (Barrón-Cedeño, 2012), aparecen trabajos que, utilizando grafos de conocimiento, aumentan la capacidad semántica del anális...
In this paper, it is presented how to enrich available linked data to facilitate the work of experts in the Musacces project (http://www.musacces.es/). First step has been implemented in the current version of a prototype MPOC1, that allows the access and visualization of information related with the Spanish Prado Museum (the target of the project)...
Las humanidades digitales pretenden facilitar el acceso y entendimiento de documentos históricos mediante aplicaciones informáticas. En este proceso es importante la etapa de representación formal y digital de los contenidos para facilitar el posterior proceso los mismos en aplicaciones, por ejemplo, de acceso y visualización, búsqueda y organizaci...
The great challenge of Museology of the XXI Century is the attention and integration of the most disadvantaged social sectors with museums. The work developed within the MUSACCES project aims to improve the relation of people with disabilities with the artistic heritage in the Prado museum trying to achieve a multisensorial user experience discover...
This work is a detailed companion reproducibility paper of the methods and experiments proposed by Lastra-Díaz and García-Serrano in [56, 57, 58], which introduces the following contributions: (1) a new and efficient representation model for taxonomies, called PosetHERep, which is an adaptation of the half-edge data structure commonly used to repre...
One of the great challenges of 21st Century Museology is the attention to and integration of the least favoured and disconnected social sectors with our museums. The museum institutions are living spaces whose managers try to bring about a more dynamic and open participation. The interest in educating and socialising culture has endowed the museums...
The Topic Detection task is focused on discovering the main topics addressed by a series of documents (e.g., news reports, e-mails, tweets). Topics, defined in this way, are expected to be thematically similar, cohesive and self-contained. This task has been broadly studied from the point of view of clustering and probabilistic techniques. In this...
There are some similarities in developing a traditional Higher Education (HE) eLearning course and MOOCs (Massive Open Online Courses), due to the use of the basis of eLearning instructional design. But in MOOCs, students should be continually influenced by information, social interactions and experiences forcing the faculty to come up with new app...
Aunque hay algunas similitudes cuando se desarrolla un curso online de enseñanza superior y un curso MOOC (Massive Open Online Courses), ya que los principios de enseñanza son muy parecidos, en los MOOC los estudiantes deben ser continuamente " animados " con nuevas informaciones, interacciones sociales o experiencias de aprendizaje diferentes, lo...
In a recent paper, we introduce a new family of Information Content (IC) models based on the estimation of the conditional probability between child and parent concepts. This work is encouraged by the
nding of two drawbacks in the computational method of our aforementioned family of IC models, as well as other two gaps in the literature. First gap...
The Topic Detection Task in Twitter represents an indispensable step in the analysis of text corpora and their later application in Online Reputation Management. Classification, clustering and probabilistic techniques have been traditionally applied, but they have some well-known drawbacks such as the need to fix the number of topics to be detected...
In this paper it is presented a study on verbs in Spanish and it’s potential to display images from the Wikipedia (Wikimedia). It is designed and developed an Information Retrieval model based on linguistic structures of verbs and an environment that allows all subsequent scaling Spanish verbs. Adesse and EuroWordNet are the linguistic resources se...
Social Accessibility is an approach to shorten the time for making web content more accessible by allowing ICT volunteers to improve its quality through collaborative work. In this context new forms of volunteerism have emerged through the creative and innovative use of ICTs. The first action in social ICT volunteering at UNED, the largest public d...
Probabilistic techniques, mainly Latent Dirichlet Allocation (LDA), have become almost a standard for the content organization in the research field of Digital Humanities (DH). However, this kind of techniques entails some problems, such as the need to fix the number of topics to be detected or the non-trivial interpretation of the obtained organiz...
This paper introduces a novel family of ontology-based similarity measures based on the Information Content (IC) theory, a detailed state of the art, a large experimental survey into ontology-based similarity measures on WordNet, and a new comparison between intrinsic and corpus-based IC models. Our experiments are based on our implementation of a...
Se introduce y contextualiza la estructuración basada en contenidos que se ha realizado sobre las casi ocho mil fichas (documentos estructurados) en castellano de la Colección de Mapas, planos y dibujos del Archivo General de Simancas (AGS) para ayuda a la investigación en el proyecto DIMH (HAR2012-31117).
This paper introduces a new family of intrinsic and corpus-based Information Content (IC) models for ontology-based similarity measures based on the IC theory, a detailed state of the art, an experimental survey of IC models and IC-based similarity measures on WordNet, and a comparison between intrinsic and corpus-based IC models. The family of IC...
Resumen: El tiempo es un elemento de importancia capital en todo espacio de in-formación y Twitter no es una excepción. La explotación de la información temporal en tareas de recuperación y organización de información , tiene una larga tradición. Sin embargo, esta clase de enfoques, basados en contenido, no han sido muy explo-rados para el dominio...
Time is a crucial element in any space of information and Twitter is no an exception. Although the exploitation of temporal information in retrieval and organization tasks has a long tradition, content-based approaches have not been fully explored for Twitter and researchers lack of sufficient Corpus annotated with temporal information. In this pap...
In a recommendation task it is crucial to have an accurate content based description of the users and the items. Linked Open Data (LOD) has been demonstrated as one of the best ways of obtaining this kind of content. The main question is to know how useful the LOD information is in inferring user preferences and how to obtain it. We propose a novel...
Se presenta el experimento realizado para enriquecer con informa-ción LOD el contenido de un conjunto de videos educativos. Se describen bre-vemente las funcionalidades desarrolladas, así como se justifica la integración de los recursos disponibles que facilitan el enriquecimiento del contenido. Fi-nalmente se presentan los resultados del experimen...
A quantitative evaluation of digital resources is proposed by employing social labeling and associated characteristic function that improves the retrieval of resources in digital repositories. The social metadata is described; the characteristic function defined, and finally the results obtained by applying the proposed formula on the reviews from...
Se presenta el experimento realizado para enriquecer con informa-ción LOD el contenido de un conjunto de videos educativos. Se describen bre-vemente las funcionalidades desarrolladas, así como se justifica la integración de los recursos disponibles que facilitan el enriquecimiento del contenido. Fi-nalmente se presentan los resultados del experimen...
In Recommender Systems (RS), and especially in Content-Based RS (CBRS), both, content and user modelling are especially important; since the performance of these systems, and consequently user satisfaction, is mostly based on the accuracy of the modelling step. This work proposes an innovative methodology to improve the representativeness of the co...
This paper summarizes the participation of UNED at the 2014 Retrieving Diverse Social Images Task [3]. We propose a novel approach based on Formal Concept Analysis (FCA) to detect the latent topics related to the images and a later Hierarchical Agglomerative Clustering (HAC) to put together the images according to these latent topics. The diversifi...
Main goal of this work is to show the improvement of using a textual pre-filtering combined with an image re-ranking in a Multimedia Information Retrieval task. The defined three step-based retrieval processes and a well-selected combination of visual and textual techniques help the developed Multimedia Information Retrieval System to overcome the...
The effective communication between user and systems is one main aim in the Multimedia Information Retrieval field. In this paper the modality classification of images is used to expand the user queries within the ImageCLEF Medical Retrieval collection provided by organizers. Our main contribution is to show how and when results can be improved by...
In this paper we present our first participation at RepLab Campaign. Our work is focused in two contributions. The first one is the use of an IR method to address Polarity and Filtering tasks. These two tasks can be seen as the same problem: to find the most relevant class to annotate a given tweet. For that, we applied a classical IR approach, usi...
http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/4568
Presentation of the second program activities of the research net MAVIR of Madrid Region. © 2012 Sociedad Española para el Procesamiento del Lenguaje Natural.
Resumen: La recuperación de información multimedia es uno de los retos que actualmente se afrontan en el entorno de la web o en grandes colecciones de objetos multimedia (audio, video, imágenes y textos). En este artículo se presenta la experimentación realizada para mejorar la calidad de la búsqueda en una colección de imágenes anotadas como es IA...
Multimedia information retrieval is one of the main challenges addressed in the Web or in big multimedia collections (audio, video, image and text). This paper presents an experimentation to improve the search quality in an annotated images collection like IAPR TC- 12. It is shown how multimedia fusion improves the monomodal search results based ju...
The main goal of this paper is to present our experiments in the classification modality and in the ad-hoc image retrieval tasks with the Medical collection at ImageCLEF 2012 Campaign. This edition we focus on applying new strategies for both the textual and the visual subsystems included in our multimodal retrieval system. The visual subsystem has...
This paper describes and discusses an approach to extract and exploit enriched Named Entities for Image Photo Retrieval. The enrichment of Named Entities is inspired by the concept of definite de-scription. The approach is evaluated using the imageCLEF-08 testset for the photo retrieval task held at Cross-Language Evaluation Forum in 2008. Results...
Virtual assistants are a promising business for the near future in the web era. This implies that the supporting applications have to be endowed with advanced capabilities to service offerings and to communicate with the users in a more direct and natural way. This paper presents the agent-based architecture of the virtual assistant and focuses on...
The main goal of this paper it is to present our experiments in ImageCLEF 2011 Campaign (Medical Retrieval Task). This edition we use textual and visual information, based on the assumption that the textual module better captures the meaning of a topic. So that, the TBIR module works firstly and acts as a filter, and the CBIR system reorder the tex...
The main goal of this paper it is to present our experiments in ImageCLEF 2011 Campaign (Wikipedia retrieval task). This edition we focused on applying different strategies of merging multimodal information, textual and visual, following both early and late fusion approaches. Our best runs are in the top ten of the global list, at positions 8, 9 an...
In this paper we present the second participation of the NLP&IR group at UNED in the MediaEval Genre Tagging Task. This categorization task was carried out applying an Information Retrieval (IR) approach considering the video collection's textual data and query expansion techniques. The results show that the combination of social tags and language...
The main goal of this paper it is to present our experiments in ImageCLEF 2010 Campaign (Wikipedia retrieval task). This edition we present a different way of using textual and visual information based on the assumption that the textual module better captures the meaning of a topic. So that, the TBIR module works firstly and acts as a filter, and t...
This paper presents Q-WordNet, a lexical resource consisting of WordNet senses automatically annotated by positive and negative polarity. Polarity classification amounts to decide whether a text (sense, sentence, etc.) may be associated to positive or negative connotations. Polarity classification is becoming important for applications such as Opin...
In this paper we present the first participation of the NLP&IR group at UNED in the Tagging Task (Professional Version): prediction of semantic theme. This categorization task was carried out by an information retrieval approach, together with language models and clustering using only metadata associated with the videos. The results show that langu...