Xavier Gómez Guinovart’s research while affiliated with University of Vigo and other places
What is this page?
This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.
Language resources are necessary for language processing,but building them is costly, involves many researches from different areas and needs constant updating. In this paper, we describe the crosslingual framework used for developing the Multilingual Central Repository (MCR), a multilingual knowledge base that includes wordnets of Basque, Catalan, English, Galician, Portuguese, Spanish and the following ontologies: Base Concepts, Top Ontology, WordNet Domains and Suggested Upper Merged Ontology. We present the story of MCR, its state in 2017 and the developed tools.
The purpose of this paper is to present the LITTERA corpus, an English-Spanish literary parallel speech corpus created for the purpose of language learning, and to sketch out a few pedagogical applications for the study of English phonology by Spanish-speaking language learners. It is composed of 25 literary texts that have been aligned with the Spanish translation and are accompanied by audio from the corresponding audiobooks. In this article, we will detail its conception, composition and features at length, as well as provide a few examples of how LITTERA can be applied in language learning, particularly within the realm of oral comprehension and speech production.
This study describes the methodology used in the development of a WordNet lexicon for the Galician language, and its applications for language processing in the fields of terminology acquisition and ontology learning and management. First, we review the Princeton WordNet lexical model, its multilingual adaptation in the EuroWordNet framework, and its implementation in the Galician WordNet building. Second, we discuss the approach and the resources used in the design of Termonet, a tool for checking and verifying in technical corpora the specialty lexicons embedded in WordNet. This tool performs an identification of the synsets in WordNet belonging to a terminological domain from the semantic relations between the nodes of the lexical network, and validates the terms by means of a semantically disambiguated specialized corpus. Third, we analyze the process of construction of a new semantic categorization of WordNet based on epinonyms and generated automatically by exploring the relations from a terminological perspective. A WordNet epinonym is a noun synset in the semantic network representing the category of the semantic domain to which other synsets will be automatically assigned by algorithms that will evaluate their proximity from a terminological point of view through the cognitive processing of the lexical-semantic relations. Last, we present some applications of the RDF Galician WordNet in the Semantic Web by means of federated queries with lexical and ontological resources available as Linked Open Data (LOD) like DBpedia, BabelNet, Wiktionary and YAGO.
Embora não sejamos supersticiosos, não podemos deixar de fazer a relação entre o décimo terceiro ano de vida da Linguamática, e a súbita descida no índice SciMago, para o terceiro quartil. Esperamos que nos próximos anos consigamos recuperar, colocando a Linguamática entre as melhores revistas da área.
Esta edição conta apenas com dois artigos: um sobre tradução automática da língua portuguesa em língua guestual portuguesa, e um outro de estudo de um corpus Espanhol-Inglês-Chinês.
Como sempre, o nosso muito obrigado a todos os revisores e autores que continuam a acreditar neste nosso/vosso projeto.
This paper presents the different methodologies and resources used to build Galnet, the Galician version of WordNet. It reviews the different extraction processes and the lexicographical and textual sources used to develop this resource, and describes some of its applications in ontology research and terminology processing.
Resumen: En esta presentación describimos la metodología utilizada para la crea-ción del Corpus SensoGal, un corpus paralelo inglés-gallego etiquetado semántica-mente con WordNet 3.0 y basado en el SemCor de la lengua inglesa. Abstract: In this presentation, we review the methodology used in the development of the SensoGal Corpus, an English-Galician parallel corpus semantically tagged with WordNet 3.0 and based on the English SemCor. 1 Introducción En este artículo 1 se describe la metodología utilizada para la creación del Corpus Senso-Gal 2 , un corpus paralelo inglés-gallego eti-quetado semánticamente con WordNet 3.0 y basado en el corpus SemCor de la len-gua inglesa. La construcción de este recur-so se realiza en el marco del proyecto TUNER , enfocado al desarrollo de recursos mul-tilingües (inglés, español, catalán, vasco y gallego) para el procesamiento de documen-tos en dominios específicos mediante tecno-logías lingüísticas de base semántica. En re-lación con el gallego, los objetivos del proyec-to incluyen el desarrollo del WordNet para la lengua asociado con el Multilingual Central Repository (MCR) (González Agirre, Lapa-rra, y Rigau, 2012), y la construcción de un corpus etiquetado semánticamente del galle-go alineado con el corpus SemCor del inglés (Landes, Leacock, y Tengi, 1998). 2 Alineamientos con SemCor El corpus SemCor del inglés es un corpus tex-tual anotado semánticamente a nivel léxico. 1 Esta investigación se lleva a cabo en el mar-co del Proyecto de Investigación TUNER (TIN2015-65308-C5-1-R) financiado por el Ministerio de Eco-nomía y Competitividad del Gobierno de España y el Fondo Europeo para el Desarrollo Regional (MINE-CO/FEDER, UE). 2 http://sli.uvigo.gal/SensoGal/ Las palabras de este corpus están etiqueta-das con una indicación del sentido concreto que poseen en su contexto de aparición. Las anotaciones indican los sentidos establecidos en la versión 1.6 del WordNet del inglés, un recurso léxico elaborado por el mismo equipo de la Universidad de Princeton que llevó a cabo la anotación del corpus SemCor (Miller et al., 1990). El SemCor está formado por 360.000 pala-bras repartidas entre 352 textos tomados del Corpus Brown. Se trata del mayor corpus general de una lengua anotado semánticamente y de libre acceso, con 192.639 palabras con significado léxico (nombres, verbos, adjetivos y adverbios) anotadas con su sentido respecto a WordNet 3. De estos 352 textos, tan solo 186 están completamente anotados con categoría gramatical, lema y sentido, mientras que en 166 solo están anotados semánticamente los verbos. Existen diferentes proyectos de creación de corpus paralelos alineados con el Sem-Cor del inglés, entre los que destaca el corpus MultiSemCor inglés-italiano, compuesto en su versión 1.1 por 116 textos en inglés 3 Con respecto al SemCor, el corpus de glosas ano-tadas del WordNet del inglés, también elaborado por el equipo de la Universidad de Princeton, es mayor cuantitativamente, pero al ser un corpus de definicio-nes contiene texto de un registro metalingüístico de características muy específicas, por lo que debe ser considerado propiamente un corpus especializado.
In this presentation, we review the methodology used in the development of the Galician DBpedia and some of its applications for language processing in the fields of entity recognition and lexical extraction.
... Multimedia corpora are mainly used in linguistics for the analysis of dialog patterns and the relationship between speech and nonverbal communicative means (gestures, facial expressions, eye movements, etc.) [10]. Such corpora are convenient for teaching languages and literature [11], in translation and interpretation studies [12], and in age psychology. One of the modern directions of research is the application of corpora for creating computer systems when the behavior and communication of real people are analyzed by videos and transferred to computer characters or to robots interacting with humans [13]. ...
... Regarding LS in Spanish, few approaches are reported in the literature. They can be classified as: (i) knowledge-based approaches which rely on "curated" lists of synonyms and corpora to propose and rank synonyms by relying on frequency and other word characteristics (Bott et al., 2012a;Baeza-Yates et al., 2015;Ferrés et al., 2017a); (ii) translation-based approaches which cast simplification as translation (Stajner (2014) and Štajner et al. (2019) implicitly learn simplification rules) and (iii) current transformer-based approaches which achieve a state of the art performance. In the context of the TSAR 2022 Lexical Simplification challenge (Saggion et al., 2022), several approaches have been proposed, mostly based on pre-trained language models. ...
... Both ontological features and lexical prototyping are key concepts that allow the easy semiautomatic extraction of lexical selections and future lexical expansion because they are fundamental for the linking of our classes to the semantic attributes present in WordNet (Gómez Guinovart, 2011;Gómez Guinovart and Solla Portela, 2018). ...
... • GalNet z (Gómez Guinovart, 2011) (Simões and Gómez Guinovart, 2014) has been under development for five years at the Polytechnic Institute of Cávado and Ave and University of Minho. Given the lack of human resources, the database was created using automatic methods, and no manual revision was performed on the data. ...
... Galnet interface also provides all its contents as RDF resources through a SPARQL endpoint 15 with free public access for users to explore the data using SPARQL queries [34]. ...
... Guinovart and Simões [7] present another parallel corporabased bilingual terminology extraction method based on the occurrence of bilingual morpho-syntactic patterns in probabilistic translation dictionaries. ...
... Alén deste recurso, os experimentos realizados con Reciclagem inclúen outras redes de coñecimento con maior cobertura, nomeadamente, CARTÃO (Oliveira et al., 2011), que a súa vez consta doutros recursos como PAPEL e as relacións extraídas do Dicionário Aberto (Simões et al., 2012), así como diferentes variantes do WordNet portugués: OpenWordNet. PT (de Paiva et al., 2012) e PULO (Simões & Guinovart, 2014). ...
... Además, el corpus de PRESEEA-Valledupar es el primer corpus de habla de esta ciudad y uno de los primeros corpus recogidos en Colombia y en el Caribe, por lo que toda investigación del habla de Valledupar debe considerarlo como un referente. Es necesario señalar, además, que este es un corpus que ha sido infrautilizado puesto que, hasta ahora, el único antecedente investigativo en variación fonética es el estudio de Olmos yGómez (2012). En este sentido, el presente estudio sobre el corpus PRESEEA-Valledupar permite ampliar la caracterización de esta comunidad de habla y establecer comparaciones con otros cor- ...
... palabras (953 textos) que forma parte do Corpus Técnico do Galego (http://sli.uvigo.es/CTG) (Crespo et al., 2008;Gómez Guinovart, 2007 (2000). A primeira das obras citadas, o Léxico do medio, é a única obra en galego que conta cun léxico específico dedicado ás ciencias ambientais. ...