Project

Cognitive and Neurological Bases for Terminology-enhanced Translation (CONTENT)

Goal: The main objective of CONTENT is to fully exploit the contents and components of EcoLexicon for purposes of translation and natural language processing. Accordingly, this project will create and implement a prototype for the terminology-enhanced translation of specialized environmental texts. This means expanding the architecture of the relational database where EcoLexicon is stored, as well as enriching the following modules: (i) the linguistic module (inclusion of relations between terms); (ii) the conceptual module (specification of non-hierarchical relations based on paradigms encoded in the phraseological module); (iii) phraseological module (expansion of syntagmatic relations enhanced with different types of collocational information). This prototype will make EcoLexicon’s data available online to users in contexts that provide a selection of semantic, syntactic, and pragmatic information specifically related to the terms in the source text. The enhancement of the modules in EcoLexicon as well as the design of the prototype require the use of more effective terminographic methods and the extensive semi-automatic processing of the corpus based on the extraction of knowledge patterns.
Parallel to the design and implementation of the prototype, still another goal is to facilitate the interoperability of EcoLexicon by linking it to other resources by means of Linked Data, a technology that publishes structured data and links them to information in other resources in compliance with Semantic Web standards. The data in the formal ontology version of EcoLexicon will thus be accessible by means of SPARQL queries. Furthermore, Ecolexicon data will be limited to the information in GEMET and AGROVOC or DBpedia with a view to offering an open resource integrated in the Semantic Web or more concretely, in Linguistic Open Data. The linking process will be semi-automatically performed by means of RDF properties (e. g. rdf: SeeAlso), OWL (e. g. owl:SameAs) and SKOS (e. g. skos:broader). In this way, the conceptual and linguistic information in Ecolexicon can be transformed into a disambiguation resource. At the same time, the content of the linked resources will be exploited in the terminology-enhanced prototype for assisted specialized translation.
Finally, the inventory of semantic relations in Ecolexicon and its underlying conceptual structure will be validated by an fMRI experimental study, based on the

successful results of a previous pilot study (Faber et al. 2014). The objectives will focus on the representation, storage, and processing of specialized concepts as well as their semantic relations. Following Muelhaus et al. (2014), the subjects (experts and non-experts) will be subjected to different stimuli (images, terminological designations, and terms associated with different types of relation) in order to analyze which type of semantic relation most facilitates the comprehension of a concept and whether vertical and horizontal semantic relations have different brain activation patterns.

Updates
0 new
1
Recommendations
0 new
0
Followers
0 new
89
Reads
0 new
1065

Project log

Melania Cabezas-García
added a research item
La traduction des termes complexes pose souvent des problèmes sur différents plans. D'une part, dans le texte source, ils doivent être bien identifiés et compris, puis correctement traduits. Cependant, la solution à ces problèmes ne se trouve pas toujours dans les ressources terminologiques telles que les dictionnaires ou les bases de données, de sorte que le traducteur doit recourir à d'autres outils riches en informations, comme les corpus. Pour en tirer le meilleur parti, il faut connaître les différentes méthodes d'exploitation des corpus offertes par les systèmes d'analyse actuels. Cependant, ces dernières sont souvent inconnues, ce qui génère une réticence de la part des traducteurs à utiliser des corpus (Bowker 2004 ; Gallego-Hernández 2015 ; Loock 2016). Dans cet article, nous développons un protocole pour faciliter la compréhension et la traduction des termes complexes à l'aide de corpus parallèles et comparables. Nous illustrons la procédure avec des termes complexes de l'anglais, que nous traduisons en français et en espagnol. ABSTRACT Translating multiword terms can pose problems at different levels. First, they need to be correctly identified and understood in the source text with a view to translating them. However, the answer to these questions cannot always be found in terminological resources such as dictionaries or databases. Therefore, translators must use other information-rich resources such as corpora. In order to make the most of them, the diverse corpus query techniques available in current corpus analysis tools must be mastered. However, these techniques are often unknown, which results in the reluctance of many translators to use corpora (Bowker 2004; Gallego-Hernández 2015; Loock 2016). This paper presents a step-by-step protocol that facilitates the comprehension and translation of multiword terms by means of parallel and comparable corpora. The procedure is illustrated with English multiword terms, which are translated into French and Spanish. RESUMEN La traducción de los términos compuestos puede presentar complicaciones a distintos niveles. Por una parte, en el texto origen es necesario identificarlos y comprenderlos de forma adecuada, para después trasladarlos correctamente a la lengua meta. Sin embargo, la respuesta a estas cuestiones no siempre se encuentra en los recursos terminológicos como diccionarios o bases de datos, de forma que el traductor debe recurrir a otras herramientas ricas en información como son los corpus. Para sacar el máximo partido, se deben conocer las distintas técnicas de interrogación de corpus que ofrecen los sistemas de análisis actuales. Sin embargo, estas a menudo se desconocen, lo que genera reticencia por parte de los traductores al uso de los corpus (Bowker 2004; Gallego-Hernández 2015; Loock 2016). En este artículo desarrollamos un protocolo paso a paso con el que se facilita la comprensión y traducción de los términos compuestos con la ayuda de corpus paralelos y comparables. Para ello, ilustramos el procedimiento con términos compuestos del inglés, que traducimos hacia el francés y el español.
Melania Cabezas-García
added a research item
Los términos poliléxicos son uno de los principales problemas de traducción en los textos especializados. Su tratamiento implica su correcta identificación, comprensión y traslado a la lengua meta. Dado que los recursos terminológicos no siempre facilitan estas tareas (debido a diferentes factores relacionados con estos términos o a la propia naturaleza de los recursos), el traductor debe buscar respuestas en otros medios, como los corpus. Para ello, es fundamental dominar diversas técnicas de interrogación de corpus, cuyo desconocimiento a menudo genera reticencia al uso de estas herramientas (Bowker 2004; Gallego-Hernández 2015). Con el fin de facilitar el tratamiento de los términos poliléxicos, en este estudio presentamos una serie de pasos en forma de procedimiento que permiten comprender y traducir estos términos del inglés hacia el español con la ayuda de corpus paralelos y comparables. Abstract Multiword terms can be problematic when translating specialized texts. Their treatment involves correctly identifying and understanding them, as well as translating them to the target language. Since terminological resources do not always facilitate these tasks (due to different characteristics of multiword terms or the nature of these resources), translators must use other resources, such as corpora. However, this requires knowledge of corpora querying, of which not all translators have a good command, thus resulting often in reluctance to corpora (Bowker 2004; Gallego-Hernández 2015). With a view to facilitating multiword term management, this study describes a step-by-step protocol that allows to understand and translate these terms from English into Spanish using parallel and comparable corpora. 1. INTRODUCCIÓN El tratamiento de los términos poliléxicos (p. ej. UV-absorbing aerosol), que constituyen las principales unidades fraseológicas del discurso especializado, es uno de los grandes escollos en cualquier proyecto de traducción. Dicho tratamiento consta generalmente de tres fases, que comportan dificultades de diversa índole: su identificación (p. ej. delimitación del término poliléxico), su comprensión (p. ej. desambiguación estructural y semántica) y su reproducción en la lengua meta (LM) (p. ej. búsqueda de equivalentes y discriminación entre variantes). El primer paso para dar respuesta a estas cuestiones constituye, lógicamente, la consulta de recursos terminológicos. Sin embargo, estos términos no siempre se incluyen o lo hacen de forma poco sistemática. Además, tal y como sostiene Bowker (2011), los traductores ya no consultan los recursos con la misma «fe ciega» que antes, por lo que muchos optan por usar sus propuestas para iniciar nuevas búsquedas en otros recursos. De este modo, el traductor deberá dominar diversas técnicas para resolver los problemas que generan los términos poliléxicos, valiéndose de herramientas como los corpus. Tradicionalmente, los traductores han recurrido a textos paralelos 1 1 Sánchez Gijón (2009) define los textos paralelos como aquellos textos que, en relación con el texto origen, proporcionan información sobre las convenciones textuales o las particularidades de los usos lingüísticos
Juan Rojas-Garcia
added a research item
The description of named entities in terminological knowledge bases has never been addressed in any depth in terminology. Firm preconceptions, rooted in philosophy, about the only referential function of proper names have presumably led to disparage their inclusion in terminology resources, despite the relevance of named entities having been highlighted by prominent figures in the discipline of terminology. Scholars from different branches of linguistics depart from the conservative stance on proper names and have foregrounded the need for a novel approach, more linguistic than philosophical, to describing proper names. Therefore, this paper proposed a linguistic and terminological approach to the study of named entities when used in scientific discourse, with the purpose of representing them in EcoLexicon, an environmental knowledge base designed according to the premises of Frame-based Terminology. We focused more specifically on named rivers (or potamonyms) mentioned in a coastal engineering corpus. Inclusion of named entities in terminological knowledge bases requires analyzing the context that surrounds them in specialized texts because these contexts convey specialized knowledge about named entities. For the semantic representation of context, this paper thus analyzed the local syntactic and semantic contexts that surrounded potamonyms in coastal engineering texts and described the semantic annotation of the predicate-argument structure of sentences where a potamonym was mentioned. The semantic variables annotated were the following: (1) semantic category of the arguments; (2) semantic role of the arguments; (3) semantic relation between the arguments; and (4) lexical domain of the verbs. This method yielded valuable insight into the different semantic roles that named rivers played, the entities and processes that participated in the events educed by potamonyms through verbs, and how they all interacted. Furthermore, since arguments are specialized terms and verbs are relational constructs, the analysis of argument structure led to the construction of semantic networks that depicted specialized knowledge about named rivers. These conceptual networks were then used to craft the thematic description of potamonyms. Accordingly, the semantic network and the thematic description not only constituted the representation of a potamonym in EcoLexicon, but also allowed the geographic contextualization of specialized concepts in the terminological resource.
Melania Cabezas-García
added a research item
Context, especially cultural context, has long been neglected in Terminology. Even though recent approaches have acknowledged the relevance of culture in specialized communication, the development of culture in Terminology is still marginal. Culture is also underrepresented in terminological resources, which may respond to the complexity of reflecting the cultural component in the description of terms and concepts. However, conceptualization is dynamic and changes from culture to culture and, for that reason, an in-depth study on how the nature of human perception and cultural cognition influences the representation of concept systems and terms in specialized knowledge contexts is needed. Furthermore, to facilitate knowledge acquisition, contextual and conceptual information should go together with multimodal information, as the combination of textual and visual material improves understanding. This study integrates different types of context (i.e., semantic relations, frames, and culture) to describe a methodology for the selection and representation of multimodal information for culturally bound concepts such as FOREST in terminological knowledge bases, based on the theoretical premises of Frame-Based Terminology. Different ideas of forest in European countries were analyzed and represented by means of culturally adapted images, which are best suited to disseminate knowledge and foreground the role of culture in specialized communication.
Antonio San Martín
added a research item
The purpose of a terminological definition is to represent in natural language the most relevant knowledge associated with a term. However, the knowledge activated by a term (i.e., its meaning) varies according to the usage context. Since context is indispensable in meaning construction, it should guide terminological definition writing. Nonetheless, the recommendation is still that a terminological definition should represent a concept's necessary and sufficient characteristics, which are regarded as context-independent. This paper proposes a parametrization of the contextual constraints applicable to terminological definitions so that context can be accounted for in them. To this end, the notions of premeaning and precontext are introduced, and different types of contextual constraints (linguistic, thematic, cultural, etc.) are discussed. We argue that the conscious application of contextual constraints by the terminologist helps to produce more useful definitions and to avoid inconsistencies and biases.
Melania Cabezas-García
added a research item
Caduceus: Publication of the Medical Division of the American Translators Association. How can translators understand and translate expert knowledge into other languages? Thanks to terminology management, which ensures that the correct terms are used consistently throughout a company, an organization, or a translation or terminology project. Its benefits range from improved translations to cost reduction, as well as effective communication, internationalization of companies, and information retrieval, among many others. In contrast, failure to manage terminology could hinder communication, create confusion, lower translator productivity, or even result in legal issues. This article describes a general framework for managing terminology in specialized translation contexts, such as medical translation. Although a chronological, step-by-step procedure is presented, these techniques may not occur sequentially in the actual workflow. The steps described include methods for: (i) corpus preparation and compilation; (ii) term extraction and conceptual analysis; (iii) identification of equivalents; and (iv) representation and storage in terminology management systems.
Juan Carlos Gil-Berrozpe
added a research item
Attributes are basic to conceptualization because they expand the meaning of concept types, such as entities and events. They are often a constituent part of multi-word terms (MWTs), which represent specialized concepts in a given knowledge domain. Since attributes contain hyponymic nuances that make MWTs different from the single-word terms to which they are linked, hyponymy is intrinsically associated with the phenomenon of MWT formation. This paper presents a corpus-driven study that was performed to explore the hyponymic behavior of botanical terminology from an attribute-based approach. Additionally, a semantic analysis was carried out to distinguish the codified semantic relations and the hyponymic nuances of the attributes in botanical MWTs. Finally, based on the data, the most relevant hyponymy subtypes in the botanical corpus were assessed. Our results showed that describing the semantic information provided by attributes could lead to a more comprehensive representation of hyponymic MWTs in lexicographic and terminological resources.
Melania Cabezas-García
added a research item
In specialized language, multiword terms (MWTs) are one of the most frequent term types. Although MWTs are highly relevant to conceptual systems and specialized knowledge transmission, they are not simple to analyze. This chapter presents a corpus-based analysis of a set of English MWTs in the domain of wind energy. The results obtained explain the conceptual development of these terms, and highlight the usefulness of this process for translation. This analysis first disambiguated the structure of these MWTs. The components of each term were then assigned semantic categories, and their internal semantic relations were made explicit. This revealed the microcontext of each MWT, which not only explains their formation and productivity, but also facilitates their translation.
Melania Cabezas-García
added a research item
Los términos compuestos son un tipo de unidad léxica especialmente habitual en el discurso especializado. En ellos se condensa el conocimiento científico-técnico, por lo que su correcto tratamiento es fundamental para la transmisión de la información. Sin embargo, su análisis presenta complicaciones. En este artículo se utilizan técnicas de corpus para investigar la formación conceptual y pragmática de términos compuestos en inglés pertenecientes al ámbito de la energía eólica. En primer lugar, se desambiguó la estructura de los términos compuestos, a cuyos componentes se les asignaron posteriormente categorías semánticas. Tras analizar la relación interna en los términos compuestos, estos se anotaron con roles semánticos. Dicho análisis reveló el microcontexto de cada término, que explica la fertilidad en la formación de los términos compuestos y facilita distintas tareas cognitivas y discursivas, como la traducción.
Melania Cabezas-García
added a research item
Complex nominals (CNs) are frequently found in specialized discourse in all languages, since they are a productive method of creating terms by combining existing lexical units. In Spanish, a conceptual combination may often be rendered with a prepositional CN (PCN) or an equivalent adjectival CN (ACN), e.g., demanda de electricidad vs. demanda eléctrica [electricity demand]. Adjectives in ACNs-usually derived from nouns-are known as 'relational adjectives' because they encode semantic relations with other concepts. With recent exceptions, research has focused on the underlying semantic relations in CNs. In natural language processing, several works have dealt with the automatic detection of relation adjectives in Romance and Germanic languages. However, there is no discourse studies of these CNs, to our knowledge, for the goal of establishing writer recommendations. This study analyzed the co-text of equivalent PCNs and ACNs to identify factors governing the use of a certain form. EcoLexicon ES, a corpus of Spanish environmental specialized texts, was used to extract 6 relational adjectives and, subsequently, a set of 12 pairs of equivalent CNs. Their behavior in co-text was analyzed by querying EcoLexicon ES and a general language corpus with 20 expressions in CQP-syntax. Our results showed that immediate linguistic co-text determined the preference for a particular structure. Based on these findings, we provide writing guidelines to assist in the production of CNs.
Melania Cabezas-García
added a research item
Los términos compuestos son una de las principales dificultades en la traducción de los textos espe­cializados. Dado que estos términos no siempre figuran en los recursos terminológicos o lo hacen de forma poco exhaustiva, los traductores y terminólogos deben dominar diversas técnicas para resolver sus dificultades. Tradicionalmente, se han utilizado textos paralelos para extraer terminología y ad­quirir conocimientos especializados. Estos textos pueden compilarse en forma de corpus comparables (es decir, textos escritos originalmente en la lengua de origen y de llegada), que pueden utilizarse para explotar un mayor número de textos de manera más eficiente. No obstante, a menudo hay reticencia a su utilización, en algunos casos debido a la falta de conocimiento de las técnicas de consulta. En este estudio se describe una metodología para identificar en corpus comparables los equivalentes en inglés de los términos compuestos en español, con vistas a: 1) facilitar esta tarea en la labor cotidiana de los traductores y terminólogos; y 2) proporcionarles técnicas de extracción manual en corpus. Con este fin, analizamos un corpus comparable de textos especializados sobre energía eólica escritos originalmente en español e inglés. La metodología propuesta se basa en la semántica distribucional y consiste en la identificación y la comparación de elementos contextuales de los conceptos en ambos idiomas. Los resultados muestran que este procedimiento simplifica la identificación de equivalentes de los términos compuestos en corpus.
Antonio San Martín
added a research item
To formulate definitions that meet user needs, terminologists and specialised lexicographers must know how to effectively select information. However, most definition writing guidelines are based on the specification of necessary and sufficient characteristics, which has serious drawbacks because it downplays the role of context (understood as any factor affecting how a term is interpreted) in specialised meaning construction. This paper focuses on thematic variation, an important type of contextual variation, and its representation in terminological definitions. To this end, this paper presents a corpus-based approach to writing definitions that takes into account thematic variation in the selection of information.
Melania Cabezas-García
added a research item
Multiword terms (MWTs) are frequently consulted in terminological resources due to their structural, cognitive, and conceptual complexity. However, in most terminological resources they are not always well described, since they are often included as independent term entries with no information on how their constituents are related. An accurate management of MWTs of three or more constituents requires, as a first step, their structural disambiguation, also called bracketing. This paper examines MWT bracketing in order to enhance MWT representation by describing their structural dependencies. Based on NLP advances in bracketing, a protocol has been designed through corpus queries and evaluated according to the reliability of corpora and rules as well as the causes underlying failure. Automatising bracketing can help enhance the representation of MWTs in terminological knowledge bases, assisting both the terminologist and the final user, since making their relational structure explicit can favour knowledge acquisition.
Melania Cabezas-García
added a research item
The internationalization of economy has become a major focus in the world today. Different tasks can be carried out for the sake of internationalization, such as corporate terminology management, which ensures that the correct terms are consistently used within the company, in line with set goals. However, many enterprises do not invest in managing their terminology, which can lead to a wide range of problems. This chapter describes a general framework for managing terminology in commercial settings. The steps include (1) corpus preparation and compilation, (2) term extraction, (3) conceptual analysis, (4) identification of equivalents, and (5) representation and storage in terminology management systems. The results showed that corporate terminology management is viable, thanks to this procedure, which obviously benefits from the participation of a linguist.
Melania Cabezas-García
added a research item
Phraseology is central to specialized language. In scientific and technical communication, multiword terms (MWTs) (e.g. volatile organic compound) are the most frequent type of phraseological units. Rendering them into another language is not an easy task due to their cognitive complexity, the proliferation of different forms, and their unsystematic representation in terminographic resources. This often results in a broad spectrum of translations for MWTs, leading to higher term variation as a result of their composition by two or more constituents. In this study we carried out a quantitative and qualitative analysis of English term variants of MWTs from the environmental domain and their translations into Spanish. The focus was on translation variation and its occurrence in different linguistic resources.
Melania Cabezas-García
added a research item
Term variation or the coexistence of different terms to name the same concept (e.g. contamination and pollution) is frequent in specialized language (Fernández-Silva 2018). Since variants are not always interchangeable, language users such as translators or terminologists need to know when and why a variant should be used in preference to another. Terminographic resources should facilitate this task by including different variants as well as the criteria guiding their selection. However, variants are not usually fully covered and, when they are included, indicators regarding semantics, pragmatics, or usage are not often provided. This paper investigates the representation of term variation in terminographic resources. Our goals were (i) to confirm whether term variants are underrepresented and usage indications are not usually provided, (ii) to collect the data categories and fields employed in the description of term variants, and (iii) to propose a model of representation of term variation in the terminological knowledge base EcoLexicon. Our results showed that, despite the prevalence of term variation, terminographic resources do not usually describe the different possibilities and/or the criteria guiding their selection. In contrast, those which attempt to add pragmatic information do not show this kind of data in a parameterized way.
Melania Cabezas-García
added 2 research items
EcoLexicon es una base de conocimiento terminológica multilingüe sobre ciencias medioambientales desarrollada desde 2003 por el grupo de investigación LexiCon de la Universidad de Granada (España) y constituye la aplicación práctica de la teoría de la terminología basada en marcos. El presente artículo describe el funcionamiento de EcoLexicon y presenta sus últimos avances, que incluyen un nuevo corpus y una gramática semántica de word sketches en inglés, una reforma del módulo fraseológico, un enfoque flexible a las definiciones terminológicas y la representación conceptual mediante imágenes.
Melania Cabezas-García
added a research item
Los términos compuestos son uno de los rasgos más distintivos y también problemáticos del discurso especializado. Este libro se adentra en aspectos controvertidos como su formación, traducción y representación. La formación de los términos compuestos puede estudiarse mediante un mecanismo de ocupación de slots activado por el núcleo, que se denomina microcontexto. En este sentido, los diferentes patrones de formación de términos y la falta de sistematicidad en su descripción complican la traducción de los términos compuestos. Su representación, por otro lado, debe abarcar las distintas características de estas unidades. Una muestra de ello es la sección de términos compuestos diseñada en EcoLexicon, una base de conocimiento terminológica sobre el medio ambiente.
Melania Cabezas-García
added a research item
In scientific and technical communication, multiword terms are the most frequent type of lexical units. Rendering them in another language is not an easy task due to their cognitive complexity, the proliferation of different forms, and their unsystematic representation in terminographic resources. This often results in a broad spectrum of translations for multiword terms, which also foment term variation since they consist of two or more constituents. In this study we carried out a quantitative and qualitative analysis of Spanish translation variants of a set of environment-related concepts by evaluating equivalents in three parallel corpora, two comparable corpora and two terminological resources. Our results showed that MWTs exhibit a significant degree of term variation of different characteristics, which were used to establish a set of criteria according to which term variants should be selected, organized and described in terminological knowledge bases.
Antonio San Martín
added a research item
Hyponymy is the cornerstone of taxonomies and concept hierarchies. However, the extraction of hypernym-hyponym pairs from a corpus can be time-consuming, and reconstructing the hierarchical network of a domain is often an extremely complex process. This paper presents the development and evaluation of the French EcoLexicon Semantic Sketch Grammar (ESSG-fr), a French hyponymic sketch grammar for Sketch Engine based on knowledge patterns. It offers a user-friendly way of extracting hyponymic pairs in the form of word sketches in any user-owned corpus. The ESSG-fr contains three times more hyponymic patterns than its English counterpart and has been tested in a multidisciplinary corpus. It is thus expected to be domain-independent. Moreover, the following methodological innovations have been included in its development: (1) use of English hyponymic patterns in a parallel corpus to find new French patterns; (2) automatic inclusion of the results of the Sketch Engine thesaurus to find new variants of the patterns. As for its evaluation, the ESSG-fr returns 70% valid hyperonyms and hyponyms, measured on 180 extracted pairs of terms in three different domains.
Beatriz Sánchez-Cárdenas
added a research item
La traducción científico-técnica ha experimentado una revolución sin precedentes en los últimos años. Entre otros factores, se debe a los avances tecnológicos, la difusión de contenidos científicos en soportes multimedia, la aparición de nuevos géneros textuales y modalidades de traducción, así como a una mayor exigencia de productividad y eficacia. Todo esto hace necesario replantear los enfoques académicos hacia este tipo de traducción para reflejar los profundos y recientes cambios que se han producido en el sector de la prestación de servicios lingüísticos. Este libro, dirigido principalmente a académicos y profesionales de la traducción, expone las bases teóricas que debe dominar todo traductor científico y presenta las herramientas lingüísticas e informáticas que pueden facilitar su labor. Desde una perspectiva transversal que recoge aportaciones teóricas de la Traductología, Lingüística, Psicología, Lingüística de corpus y Terminología, se indaga en el proceso de la traducción científica, su contexto profesional y sus principales dificultades teóricas y prácticas. También se revisan las competencias, herramientas y recursos más demandados en el entorno profesional. Como elemento innovador, las autoras ofrecen una base metodológica para resolver retos lingüísticos de la traducción científica con la ayuda de corpus electrónicos especializados y ejemplos prácticos extraídos de textos reales. https://www.comares.com/libro/retos-de-la-traduccion-cientifico-tecnica-profesional_107152/
Juan Carlos Gil-Berrozpe
added a research item
Contemporary research has focused on how concepts are represented and organized in the mind, leading to neurocognitive theories such as grounded cognition or embodied cognition. These theories have greatly influenced further studies in linguistics and terminology. In this way, conceptualization, categorization, and knowledge organization are the foundation of cognitive-oriented terminology theories which highlight the relevance of situated knowledge structures, such as Frame-based Terminology. Accordingly, the practical application of Frame-based Terminology is EcoLexicon, a dynamic terminological knowledge base on environmental science. Concepts in this terminological resource are domain-specific within the Environmental Event, a model that interrelates concepts by assigning them different roles. However, the Environmental Event does not include specific category types to annotate these concepts ontologically. Therefore, this paper presents a process of ontological knowledge enhancement in EcoLexicon. This process was mainly based on the categorization of its concepts in semantic classes with a multidimensional approach. As a result, EcoLexicon was ontologically enhanced not only in terms of this categorization, but also through a redesign of the conceptual categories module, which involved modifying the existing category hierarchy and implementing new features focused on describing the combinatorial potential of concepts and categories (i.e. the conceptual combinations function and the ontological view).
Miriam Buendía Castro
added a research item
The Environment is a relatively new specialized domain. In the last few years, vari- ous online environmental resources have emerged, and large databases have started to include environmental terms within their entries. However, because of its relatively recent appearance, environmental terminology has not been standardized and its treatment in dictionaries has not been systematized. As such, this paper describes and compares a set of 18 online bilingual or multilingual (including Spanish and English) specialized resources on the Environment to evaluate their usefulness for translation. The focus is on internet dictionaries according to the clas- sification proposed by De Schryver (2003, 151). This analysis focused on how each resource deals with access to phraseological information and the description of the information given for both the source term and the translated term(s). The head- word ‘erosion’ was used for purposes of comparison.
Melania Cabezas-García
added a research item
Multi-word terms pose many challenges in Natural Language Processing (NLP) because of their structure ambiguity. Although the structural disambiguation of multi-word expressions, also known as bracketing, has been widely studied, no definitive solution has as yet been found. Although linguists, terminologists, and translators must deal with bracketing problems, they generally must resolve problems without using advanced NLP systems. This paper describes a series of manual steps for the bracketing of multi-word terms (MWTs) based on their linguistic properties and recent advances in NLP. After analyzing 100 three- and four-term combinations, a set of criteria for MWT bracketing was devised and arranged in a step-by-step protocol based on frequency and reliability. Also presented is a case study that illustrates the procedure.
Juan Rojas-Garcia
added a research item
Knowledge patterns (KPs), i.e. markers that convey semantic relations, are frequently used to extract conceptual information from a corpus. This paper describes a semi-automatic method based on KPs for exploring the semantic relations that underlie the automatic clustering of terms in a corpus of English environmental texts. A clustering technique was applied to the semantic vectors generated by a distributional semantic model (DSM), which can estimate semantic similarity between terms. KP queries were performed in Sketch Engine to find some evidence of the semantic coherence of one of the three clusters obtained. The KP-based approach combined with a DSM and a clustering technique were used for an in-depth semantic analysis of two terms belonging to the same cluster, namely, the entity beach and the process erosion. The results showed that: (1) both beach and erosion held a limited number of relations with other terms; (2) most of their related terms also belonged to the same cluster as that of beach and erosion; and (3) the specific relations held by the terms beach and erosion depended on the type of concept they designate, namely, an entity or a process, respectively. The analysis also showed that the inclusion of clustering information to the KPs used in this paper is expected to provide the basis for the inclusion of semantic constraints (e.g., semantic categories) to KPs.
Juan Rojas-Garcia
added a research item
EcoLexicon is a terminological knowledge base on environmental science, whose design permits the geographic contextualization of data. For the geographic contextualization of landform concepts, this paper presents a semi-automatic method for extracting terms associated with named rivers (e.g., Mississippi River). Terms were extracted from a specialized corpus, where named rivers were automatically identified. Statistical procedures were applied for selecting both terms and rivers in distributional semantic models to construct the conceptual structures underlying the usage of named rivers. The rivers sharing associated terms were also clustered and represented in the same conceptual network. The results showed that the method successfully described the semantic frames of named rivers with explanatory adequacy, according to the premises of Frame-Based Terminology.
Juan Rojas-Garcia
added a research item
EcoLexicon (http://ecolexicon.ugr.es) is a terminological knowledge base on environmental science, whose design permits the geographic contextualization of data. For the geographic contextualization of LANDFORM concepts, this paper presents a semi-automatic method of extracting terms associated with named bays (i.e., Greenwich Bay). Terms were extracted from a specialized corpus, where named bays were automatically identified. Statistical procedures were applied for selecting both terms and bays in distributional semantic models to construct the conceptual structures underlying the usage of named bays. The bays sharing associated terms were also clustered and represented in the same conceptual network. The results showed that the method successfully described the semantic frames of named bays with explanatory adequacy, according to the premises of Frame-based Terminology. RESUMEN EcoLexicon (http://ecolexicon.ugr.es) es una base de conocimiento terminológica sobre ciencias medioambientales, cuyo diseño permite la contextualización geográfica de conceptos de la categoría ACCIDENTE GEOGRÁFICO. Para tal fin, este artículo presenta un método semiautomático para extraer términos asociados con bahías con nombre propio (e.gr., Bahía de Pensacola). Los términos se extrajeron de un corpus especializado, donde las designaciones de bahías se identificaron automáticamente. Se aplicaron procedimientos estadísticos para seleccionar bahías y términos, que se proyectaron en espacios semánticos vectoriales, y se emplearon para construir las estructuras conceptuales que subyacían en el uso de la bahías. Los resultados muestran que el método es apropiado para describir los marcos semánticos que evocan las bahías, según las premisas de la Terminología basada en Marcos. Palabras clave: Bahía con nombre propio, Extracción de información conceptual, Contextualización geográfica, Minería de textos, Terminología basada en Marcos.
Melania Cabezas-García
added a research item
Understanding specialized discourse requires the identification and activation of knowledge structures underlying the text. The expansion and enhancement of knowledge is thus an important part of the specialized translation process (Faber 2015). This paper explores how the analysis of terminological meaning can be addressed from the perspective of Frame-Based Terminology (FBT) (Faber 2012, 2015), a cognitive approach to domain-specific language, which directly links specialized knowledge representation to cognitive linguistics and cognitive semantics. In this study, context expansion was explored in a three-stage procedure: from single terms to multi-word terms, from multi-word terms to phrases, and from phrases to frames. Our results showed that this approach provides valuable insights into the identification of the knowledge structures underlying specialized texts.
Melania Cabezas-García
added a research item
Multi-word terms (MWTs), in the form of noun compounds (NCs), are frequently used in specialized texts (Nakov 2013). They consist of juxtaposed terms with underlying semantic structures that limit the combination of arguments (Pinker 1989). However, NCs formed by more than two terms have received little attention. This study focuses on English and Spanish three-term endocentric NCs used in Coastal Engineering. To explore the presence of semantic preference and semantic prosody in these MWTs, a set of terms has been extracted from a Coastal Engineering corpus. The structure of the MWTs has been disambiguated and the semantic relations between their components have been specified. Verb paraphrases have also been elicited from field experts and the web, and then semantically analyzed. The results show that semantic preference and semantic prosody play an important role in the formation of MWTs and should be taken into account when rendering a text into another language.
Juan Carlos Gil-Berrozpe
added a research item
La traducción especializada es una disciplina que puede verse enormemente beneficiada del acceso a recursos lingüísticos y terminológicos de alta calidad. La mayoría de estos recursos son multilingües para facilitar la traducción en distintas combinaciones lingüísticas, pero el chino no suele ser una de las lenguas disponibles. Así pues, en este estudio piloto se realizó una inclusión preliminar de la lengua china en EcoLexicon, una base de conocimiento terminológica especializada en conceptos medioambientales y desarrollada por el grupo LexiCon de la Universidad de Granada. De esta manera, el chino se convirtió en la primera lengua asiática de EcoLexicon y el octavo idioma disponible en total. También se incorporaron en EcoLexicon nuevos conceptos y términos medioambientales relacionados con aspectos tecnológicos y culturales propios de China y se analizaron los distintos retos y desafíos derivados de sus correspondientes traducciones a las demás lenguas, especialmente al español. En conclusión, este trabajo demuestra los beneficios derivados de la inclusión del chino en recursos terminológicos para así favorecer la comunicación intercultural a múltiples niveles.
Antonio San Martín
added a research item
The EcoLexicon English Corpus (EEC) is a 23.1-million-word corpus of contemporary environmental texts. It was compiled by the LexiCon research group for the development of EcoLexicon (Faber, Leon-Arauz & Reimerink 2016; San Martin et al. 2017), a terminological knowledge base on the environment. It is available as an open corpus in the well-known corpus query system Sketch Engine (Kilgarriff et al. 2014), which means that any user, even without a subscription, can freely access and query the corpus. In this paper, the EEC is introduced by de- scribing how it was built and compiled and how it can be queried and exploited, based both on the functionalities provided by Sketch Engine and on the parameters in which the texts in the EEC are classified.
Antonio San Martín
added a research item
This paper presents a new method of creating definitions, which can be integrated into the workflow of any specialized lexical resource to solve one of the problems of specialized definitions, namely, the fact that they do not always meet the needs of their target users. One of the reasons for this is the one-size-fit-all approach that is usually followed. The current practice in specialized lexicography is for each term to have only one definition. However, the knowledge trasmitted by a term in real use events varies, depending on the context of activation. As a consequence, terminological definitions tend to be either too general in an effort to encompass all possible contexts, or be too specific, which means that many applicable contexts are omitted. This paper characterizes the contextual dimension of terminological definitions and describes the flexible terminological definition approach, which includes a corpus-based methodology for crafting contextualized definitions founded on cognitive linguistics principles. As a practical example, we applied this methodology to the elaboration of definitions of environmental terms with Caribbean thematic-cultural contextual constraints. The resulting flexible definitions (based on corpus evidence) reflect the conceptual content of environmental terms from a Caribbean perspective. Consequently, these definitions provide more relevant information for users such as translators, legislators or scientific writers dealing with environmental terminology in the Caribbean context. It thus follows that incorporating the flexible definition approach in Caribbean specialized lexicography would increase the quality of its lexical resources.
Melania Cabezas-García
added a research item
In English, the international language of communication (Tono in Lexicography 1(1):1–5, 2014), complex nominals (CNs) are frequently used to convey specialized concepts (Sager et al. in English special languages. Principles and practice in science and technology. Brandstetter Verlag, Wiesbaden, 1980; Nakov in Natural Language Engineering 19(03):291–330, 2013). These phraseological units have a nominal head that is modified by another element (e.g., hydropower production). Problems can arise in relation to their identification, their bracketing or internal structure disambiguation, their meaning access, and their translation or production in another language. Although they are not marginal phenomena in specialized language, they are rarely included in specialized resources. Even when they are included, their treatment is not systematic (Cabezas-García and Faber in Computational and corpus-based phraseology. Springer, Cham, pp 145–159, 2017a). This article describes the representation of CNs in EcoLexicon (http://www.ecolexicon.ugr.es), a terminological knowledge base, whose new phraseological module will include verb collocations (e.g., a volcano spews lava) as well as CNs. For that purpose, we used a wind power corpus in English and Spanish for term extraction, semantic analysis, establishment of interlinguistic correspondences, and definition crafting. We propose different access points to information (Kwary in International Journal of Lexicography 25(1):30–49, 2012), such as the CNs formed from a given term, a bilingual view in English and Spanish, or the syntactic–semantic combinations in CNs. The structure of the CN module is based on the semantics of these phraseological units, which facilitates the specification of mapping rules as well as knowledge acquisition (Faber in A cognitive linguistics view of terminology and specialized language. De Gruyter Mouton, Berlin, 2012).
Juan Carlos Gil-Berrozpe
added a research item
In English, specialized concepts frequently take the form of complex nominals (CNs), e.g. greenhouse gas emissions. The syntactic-semantic complexity of these multi-word terms (MWTs) highlights the need for a systematic treatment in specialized resources. This paper explores how semantic patterns in CNs can be applied to retrieve information in terminological knowledge bases, specifically in EcoLexicon (http://ecolexicon.ugr.es), the practical application of Frame-based Terminology (Faber 2012). For that purpose, we extracted the 250 most frequent CNs in an English wind power corpus. Structural disambiguation was performed to identify the internal groups linked by semantic relations. Ad-hoc semantic categories were then assigned to the elements of CNs with a view to studying the formation of CNs and allowing semantic-based queries in EcoLexicon. Then, the semantic relations between the CN constituents were analyzed by means of knowledge patterns and paraphrases. Our preliminary results showed recurrent semantic patterns in CN formation. This facilitates the inference of semantic relations, which is one of the main difficulties of MWTs. Furthermore, a semantic-based view of the CN module of EcoLexicon is presented, which allows different types of semantic query.
Antonio San Martín
added a research item
The EcoLexicon English Corpus (EEC) is a 23.1-million-word corpus of contemporary environmental texts. It was compiled by the LexiCon research group for the development of EcoLexicon (Faber, León-Araúz & Reimerink 2016; San Martín et al. 2017), a terminological knowledge base on the environment. It is available as an open corpus in the well-known corpus query system Sketch Engine (Kilgarriff et al. 2014), which means that any user, even without a subscription, can freely access and query the corpus. In this paper, the EEC is introduced by describing how it was built and compiled and how it can be queried and exploited, based both on the functionalities provided by Sketch Engine and on the parameters in which the texts in the EEC are classified.
Antonio San Martín
added a research item
Many projects have applied knowledge patterns (KPs) to the retrieval of specialized information. Yet terminologists still rely on manual analysis of concordance lines to extract semantic information, since there are no user-friendly publicly available applications enabling them to find knowledge rich contexts (KRCs). To fill this void, we have created the KP-based EcoLexicon Semantic SketchGrammar (ESSG) in the well-known corpus query system Sketch Engine. For the first time, the ESSG is now publicly available inSketch Engine to query the EcoLexicon English Corpus. Additionally, reusing the ESSG in any English corpus uploaded by the user enables Sketch Engine to extract KRCs codifying generic-specific, part-whole, location, cause and function relations, because most of the KPs are domain-independent. The information is displayed in the form of summary lists (word sketches) containing the pairs of terms linked by a given semantic relation. This paper describes the process of building a KP-based sketch grammar with special focus on the last stage, namely, the evaluation with refinement purposes. We conducted an initial shallow precision and recall evaluation of the 64 English sketch grammar rules created so far for hyponymy, meronymy and causality. Precision was measured based on a random sample of concordances extracted from each word sketch type. Recall was assessed based on a random sample of concordances where known term pairs are found. The results are necessary for the improvement and refinement of the ESSG. The noise of false positives helped to further specify the rules, whereas the silence of false negatives allows us to find useful new patterns.
Juan Carlos Gil-Berrozpe
added a research item
Hyponymy is a central relation in many models of the lexicon. However, when this type_of relation is not accurately represented in knowledge resources, various problems arise, ranging from information overload to failures in transitivity. A possible solution to this is the specification of hyponymy by decomposing it into a more fine-grained set of subtypes. This paper reviews how hyponymy is built in EcoLexicon, a multilingual termino-logical knowledge base on the environment, and proposes a set of hyponymy subtypes based on the conceptual networks contained in EcoLexicon as well as on corpus analysis.
Melania Cabezas-García
added a research item
Complex nominals (CNs) (e.g. wind turbine) are very common in English specialized texts (Nakov, 2013). However, all too frequently they show similar external forms but encode different semantic relations because of noun packing. This paper describes the use of paraphrases that convey the conceptual content of English two-term CNs (Nakov and Hearst, 2006) in the domain of environmental science. The semantic analysis of CNs was complemented by the use of knowledge patterns (KPs), which are lexico-syntactic patterns that usually convey semantic relations in real texts (Meyer, 2001; Marshman, 2006). Furthermore, the constituents of CNs were semantically annotated with conceptual categories (e.g. beach [LANDFORM] erosion [PROCESS]) with a view to disambiguating the semantic relation between the constituents of the CN and developing a procedure to infer the semantic relations in these multi-word terms. The results showed that the combination of KPs and paraphrases is a helpful approach to the semantics of CNs. Accordingly, the conceptual annotation of the constituents of CNs revealed similar patterns in the formation of these complex terms, which can lead to the inference of concealed semantic relations.
Antonio San Martín
added a research item
Many projects have applied knowledge patterns (KPs) to the retrieval of specialized information. Yet terminologists still rely on manual analysis of concordance lines to extract semantic information, since there are no user-friendly publicly available applications enabling them to find knowledge rich contexts (KRCs). To fill this void, we have created the KP-based EcoLexicon Semantic Sketch Grammar (ESSG) in the well-known corpus query system Sketch Engine. For the first time, the ESSG is now publicly available in Sketch Engine to query the EcoLexicon English Corpus. Additionally, reusing the ESSG in any English corpus uploaded by the user enables Sketch Engine to extract KRCs codifying generic-specific, part-whole, location, cause and function relations, because most of the KPs are domain-independent. The information is displayed in the form of summary lists (word sketches) containing the pairs of terms linked by a given semantic relation. This paper describes the process of building a KP-based sketch grammar with special focus on the last stage, namely, the evaluation with refinement purposes. We conducted an initial shallow precision and recall evaluation of the 64 English sketch grammar rules created so far for hyponymy, meronymy and causality. Precision was measured based on a random sample of concordances extracted from each word sketch type. Recall was assessed based on a random sample of concordances where known term pairs are found. The results are necessary for the improvement and refinement of the ESSG. The noise of false positives helped to further specify the rules, whereas the silence of false negatives allows us to find useful new patterns.
Juan Carlos Gil-Berrozpe
added 2 research items
The organization of a terminological knowledge base (TKB) relies on the identification of relations between concepts. This involves making an inventory of semantic relations and extracting these relations from a corpus by means of knowledge patterns (KPs). In EcoLexicon, a multilingual and multimodal TKB on the environment, 17 semantic relations are currently being used to link environmental concepts. These relations include six subtypes of meronymy, but only one subtype of hyponymy (type_of). However, a recent pilot study (Gil-Berrozpe et al., in press) showed that the generic-specific relation could also be subdivided. Interestingly, these preliminary results indicated that hyponymy subtypes were constrained by the ontological nature of concepts, depending on whether they were entities or processes. The new proposal presented in this paper expands the scope of our preliminary research on hyponymy subtypes to include concepts belonging to a wider range of semantic categories, and examines the behavior of knowledge patterns used to extract hyponymic relations. In this research, corpus analysis was used to explore the correlation of concepts in many different categories with KPs as well as with hyponymy subtypes. Thanks to these constraints, it was possible to formulate a more comprehensive inventory of generic-specific relations in the environmental domain.
This chapter analyzes the effectiveness of EcoLexicon for specialized translation and discusses certain problems derived from an overly simple definition of generic-specific relations. In this line, we explore and assess ‘umbrella concepts’ as a means of restricting the sense of hyponymy. Moreover, we describe the context and methodology for creating them and increasing their number. Our study resulted in the specification of a new and enhanced set of "umbrella concepts" that have improved the conceptual structure of the EcoLexicon knowledge base.
Silvia Montero Martínez
added a research item
En este trabajo se estudia la manifestación y la construcción conceptual de metáforas conceptuales en textos científicos en el dominio del cambio climático en lengua inglesa y la lengua árabe. Además, se describe la influencia de la traducción desde el inglés hacia el árabe en la pérdida de dominios. Con esta finalidad, este trabajo presenta resultados de un análisis terminológico contrastivo basado en un corpus, desde la perspectiva de la teoría de la Metáfora Conceptual (Lakoff y Johnsen 2003), la teoría de la Terminología Basada en Marcos (Faber et al. 2012) y la Semántica de Marcos (Faber 2015). Todo ello permite la representación de la realidad a través de marcos cognitivos, fruto de un proceso top-down y bottom-up, con el objetivo de poder categorizarla en un plano multidimensional y dinámico que facilite la extracción de información semántico-conceptual de los corpus de trabajo
Miriam Buendía Castro
added 2 research items
This study proposes a methodology to disambiguate verb meaning in terminographic resources. To this end, the underlying approach used for verb entries description in the environmental knowledge base EcoLexicon has been applied (Buendía, Montero and Faber 2014; Faber and Buendía 2014). The description is based on three parameters: (i) the nuclear meaning of the verb (i.e. its lexical domain, as proposed by the Lexical Grammar Model (Faber & Mairal 1999); (ii) its meaning dimension (i.e. the lexical subdomain); (iii) its predicate-argument structure highlighting the semantic categories of the arguments. Our study proves that a verb can activate different meaning dimensions in different lexical domains, depending on the semantic categories, or different roles (Van Valin 2005) of its arguments. This way of describing verb meaning according to lexical domains and subdomains, semantic roles and semantic categories helps to disambiguate verbs and represent their meaning in specialized resources.
Antonio San Martín
added 10 research items
Ontologies have been criticized because they demand too much work or because they are not sufficiently flexible to capture the dynamism and complexity of reality (Kingston 2008). However, even though any representation of reality is imperfect, ontologies are the type of computational knowledge representation that best approximates the domain being conceptualized. In fact, they have increasingly come into focus because of the need for knowledge management and shared knowledge in both general and specialized knowledge domains. EcoLexicon is a frame-based visual thesaurus on the environment, whose knowledge is stored in a relational database, and which is gradually evolving towards the status of a formal ontology (León et al. 2008; León and Magaña 2010). This paper describes the conceptual modeling techniques used in this knowledge resource, and the underlying theoretical premises that enable its contextualization and connection to general knowledge structures and resources.
Conceptual modeling is the activity of formally describing aspects of the physical and social world around us for purposes of understanding and communication. The conceptual modeler thus has to determine what aspects of the real world to include, and exclude, from the model, and at what level of detail to model each aspect [Kotiadis and Robinson, 2008]. The way that this is done depends on the needs of the potential users or stakeholders, the domain to be modeled, and the objectives to be achieved. A principled set of conceptual modeling techniques are thus a vital necessity in the elaboration of resources that facilitate knowledge acquisition and understanding. In this respect, the design and creation of terminological databases for a specialized knowledge domain is extremely complex since, ideally, the data should be interconnected in a semantic network by means of an explicit set of semantic relations. Nevertheless, despite the acknowledged importance of conceptual organization in terminological resources [Puuronen, 1995], [Meyer et al., 1997], [Pozzi, 1999], [Pilke, 2001], conceptual organization does not appear to have an important role in their design. It is a fact that astonishingly few specialized knowledge resources available on Internet contain information regarding the location of concepts in larger knowledge configurations [Faber et al., 2006]. Such knowledge resources do not take into account the dynamic nature of categorization, concept storage and retrieval, and cognitive processing [Louwerse and Jeuniaux, 2010], [Aziz-Zadeh and Damasio, 2008], [Patterson et al., 2007], [Gallese and Lakoff, 2005]. Recent theories of cognition reflect the assumption that cognition is typically grounded in multiple ways, e.g. simulations, situated action, and even bodily states. This means that a specialized knowledge resource that facilitates knowledge acquisition should thus provide conceptual contexts or situations in which a concept is conceived as part of a process or event. Since knowledge acquisition and understanding requires simulation, this signifies that horizontal relations defining goal, purpose, affordance, and result of the manipulation and use of an object are just as important, if not more so, than vertical generic-specific and part-whole relations. Within the context of recent theories of cognition, this paper examines the frame-based conceptual modeling principles underlying EcoLexicon, a multilingual knowledge base of environmental concepts (http://ecolexicon.ugr.es/) [Faber et al., 2005, 2006, 2007].
EcoLexicon (http://ecolexicon.ugr.es) is a terminological knowledge base on the environment that currently holds 3,351 concepts and a total of 17,475 terms in English, Spanish, German, Russian, French, and Modern Greek. Concepts are linked by means of hierarchical and non-hierarchical relations in dy- namic networks and in definitions. The environmental domain is interdisciplinary and its concepts can be categorized from different perspectives, thus conceptual representation needs to be multidimensional. Although, unlike other knowledge resources, conceptual representations in EcoLexicon reflect multidi- mensional categorization, this has also produced an information overload, particularly at upper concept levels. This means that many concepts show overloaded networks partly caused by multiple inheritance, as many of them have several hyperonyms. However, all conceptual dimensions do not occur at the same time but rather are context-dependent. Since the context of a concept is the set of concepts relevant to its intended meaning, we solved the information overload problem by recontextualizing networks in terms of discipline-based domains. The recontextualization of concepts constrains their relations with other con- cepts, depending on the activation scenario. By no means, does this imply that these are different senses of a polysemic term, but concepts also vary by context regardless of sense variation. Given that termino- logical definitions are also an integral part of the representation of multidimensionality, we applied the same contextual constraints to definitional propositions. The result is what we call flexible terminological definitions. This paper describes the representation of context-dependent multidimensionality in EcoLexi- con and, more specifically, how this phenomenon is managed in terminological definitions.
Melania Cabezas-García
added a research item
Multi-word terms (MWTs) are the main way that concepts are linguistically expressed in specialized domains. Accessing the semantic content of these compressed propositions is the first step toward understanding and translating them. Until now, most studies have focused on two-term compounds (Kim & Baldwin, 2013). This paper, however, deals with three-term English and Spanish endocentric noun compounds in the specialized domain of Coastal Engineering. Our analysis involved parsing, bracketing, and the assignment of semantic relations. The meaning of the MWTs was then expanded through paraphrasing (Nakov, 2013). Our results showed that a predicate-based analysis facilitated the specification of the relations between the concepts in MWTs as well as the mapping of this content onto the corresponding term in the target language.
Melania Cabezas-García
added a research item
Complex nominals (CNs) are characterized by the omission of the semantic relation between their constituents due to noun packing. Despite their frequency in specialized texts written in English [1] their representation and inclusion in knowledge resources has received little research attention. This paper presents a proposal for the inclusion of CNs in an English terminographic resource on renewable energy. For that purpose, we used knowledge patterns and paraphrases to access the meaning of CNs in a wind power corpus. We then filled the definitional templates proposed by Frame-based Terminology [2]. Our main goal was to conceptually organize a term entry to facilitate knowledge of the domain while keeping the entry length to a minimum. Furthermore, this proposal is a valuable starting point toward the development of bilingual and multilingual resources since translation should be based on meaning. Our results also afforded insights into compound term formation in English, as reflected in the addition of specific values to the semantic relations encoded by the hypernym. Term instability and multidimensionality were also prevalent.
Miriam Buendía Castro
added a research item
This research shows how to identify the diatopic verb phraseological differences between Mexican and peninsular Spanish in environmental texts, and more specifically, within the domain of natural disasters. The study is performed by analyzing five Mexican dictionaries as well as a specialized corpus. The environmental event (EE) and the sematic category of natural disasters were organized based on the premises of Frame-based Terminology (FBT) (Faber 2009, 2011, 2012). In FBT, the EE represents and configures the most generic categories within the field of environmental science. Semantic categories in FBT are generalizations of a set of terms that have a similar semantic and syntactic behavior. To detect diatopic variants an integrated top-down and bottom-up approach was followed. As such, all the potential members of the semantic category natural disaster in Spanish were searched in various dictionaries (top-down approach). Subsequently, these terms were extracted and analyzed in corpus texts (bottom-up approach) to find the most frequent verb collocations and argument patterns. This research highlights that phraseological diatopic varieties exist at a morphosyntactic, morphological, and lexical level in specialized discourse. The conclusion is that specialized dictionaries and other terminographic resources should incorporate these varieties so that users can become aware of them and use them when needed.
Melania Cabezas-García
added a research item
EcoLexicon is a multilingual terminological knowledge base on the environment. It is the practical application of Frame-based Terminology, a cognitive approach to the representation of specialized knowledge. Recent enhancements include the EcoLexicon English corpus, a phraseological module, and a flexible approach to terminological definitions.
Pamela Faber
added 2 research items
In this study on food terminology and culture, Frame-based Terminology Theory (FBT) (Faber 2012, 2015) was combined with corpus analysis to explore the use of culture-specific terms in the food categories of bread and rice. For the sake of comparison, semplates (Levinson and Burenhult 2009) were formulated for food, bread, and rice, as a kind of cultural frame to highlight the relatedness of these categories, based on the actions that were most frequently linked to them in our corpus. For this purpose, an FBT semantic analysis of these terms in a general language corpus was combined with an analysis of their cultural contexts in the literary work of authors such as Sandra Cisneros, Najat El Hachmi, Chimamanda Adichie, and others. The situations portrayed in their novels reflect the cultural embeddedness of food and its communicative value.
Melania Cabezas-García
added a research item
Scientific and technological advances generate new concepts, and thus, new terms to designate them (Štekauer 1998; Cartier and Sablayrolles 2008). Usually, terms are first created in English (Sanz Vicente 2012), the lingua franca of communication. In specialized discourse, the prevalent terms are noun compounds (Nakov 2013). Evidently, to disseminate knowledge, these multi-word terms must be translated. However, noun compounds are often problematic, given the formation patterns in different languages and the syntactic-semantic complexity of these units (Sanz Vicente 2012). Thus, addressing the semantics of noun compounds is essential, since this is usually the basis for term formation in different languages. This paper describes the role of predicate-argument structures (i.e. micro-contexts) in Spanish neological noun compounds in the domain of wind power, since argument structure represents the interface between syntax and semantics. To this end, the micro-contexts of equivalent noun compounds in English and Spanish were compared. Our results showed that neological noun compounds in Spanish were formed according to the syntactic-semantic patterns of their English counterparts, which highlights the role of argument structure in term formation.
Juan Carlos Gil-Berrozpe
added 2 research items
The organization of a terminological knowledge base (TKB) relies on the identification of relations between concepts. This involves making an inventory of semantic relations and extracting these relations from a corpus by means of knowledge patterns (KPs). In EcoLexicon, a multilingual and multimodal TKB on the environment, 17 semantic relations are currently being used to link environmental concepts. These relations include six subtypes of meronymy, but only one subtype of hyponymy (type_of). However, a recent pilot study (Gil-Berrozpe et al., in press) showed that the generic-specific relation could also be subdivided. Interestingly, these preliminary results indicated that hyponymy subtypes were constrained by the ontological nature of concepts, depending on whether they were entities or processes. The new proposal presented in this work expands the scope of the preliminary research on hyponymy subtypes to include concepts belonging to a wider range of semantic categories, and examines the behavior of knowledge patterns used to extract hyponymic relations. In this research, corpus analysis was used to explore the correlation of concepts in many different semantic categories with KPs as well as with hyponymy subtypes. Thanks to these constraints, it was possible to formulate a more comprehensive inventory of generic-specific relations in the environmental domain.
Silvia Montero Martínez
added 2 research items
Resumen Los trabajos lexicográficos y terminográficos, por lo general, han favorecido el estudio de las unidades nominales en detrimento de las unidades verbales, a pesar de que los verbos son elementos claves en la transmisión de conocimiento general y experto ( L’Homme, 1998 ). Este artículo propone una metodología de clasificación y descripción de las colocaciones verbales, de acuerdo con su significado, con la finalidad de facilitar la adquisición y codificación de conocimiento experto. La hipótesis de trabajo es que la clasificación de verbos, en dominios y subdominios léxicos, de acuerdo con el Modelo de la Gramática Léxica ( Faber & Mairal Usón, 1999 ), y la identificación de los requisitos y restricciones de cada subdominio, de acuerdo con el tipo de argumento que activan los predicados, permitirá predecir el significado de las colocaciones verbales resultantes y establecer generalizaciones dentro de cada subdominio a través de patrones fraseológicos ( Montero Martínez, 2008 ). Para ello, se propone una descripción de los argumentos que atienda a su categoría semántica, a los roles que activan (roles temáticos y macro-roles de la Gramática del Papel y la Referencia) ( Van Valin & LaPolla, 1997 ; Van Valin, 2005 ) y a su estructura morfosintáctica. Las descripciones resultantes, ilustradas para colocaciones verbales en español del área del medio ambiente y representadas en la base de conocimiento EcoLexicon , resultarían muy valiosas en recursos lexicográficos y terminográficos destinados a la producción textual y la adquisición de conocimiento por parte del usuario.
Beatriz Sánchez-Cárdenas
added 2 research items
Research in terminology has traditionally focused on nouns. Considerably less attention has been paid to other grammatical categories such as adverbs. However, these words can also be problematic for the novice translator, who tends to use the translation correspondences in bilingual dictionaries without realizing that formal equivalence is not necessarily the same as textual equivalence. However, semantic values, acquired in context, go far beyond dictionary meaning and are related to phenomena such as semantic prosody and preferences of lexical selection that can vary, depending on text type and specialized domain. This research explored the reasons why certain adverbial discourse connectors, apparently easy to translate, are a source of translation problems that cannot be easily resolved with a bilingual dictionary. Moreover, this study analyzed the use of parallel corpora in the translation classroom and how it can increase the quality of text production. For this purpose, we compared student translations before and after receiving training on the use of corpus analysis tools
In teaching specialised translation, one of the challenges is to help students perceive the text as a whole. This difficulty is directly related to the way that each language culture tends to structure scientific discourse, as reflected in text types and their information structure. Interlinguistic and intertextual variation directly affects the use of logical connectors, syntax, and semantic prosody. In this sense, general language words can be a challenge for the translator of specialised texts since their behavior in general language texts differs from their behaviour in specialised language texts. For example, in bilingual Spanish-English dictionaries, however/sin embargo, currently/ actualmente and inadequately/inadecuadamente, are generally regarded as translation correspondences. Nevertheless, in specialised texts, this equivalence is more apparent than real because of the specific contextual constraints imposed by each of these words, which can vary, largely due to their semantic prosody. The main objective of this study was to assess the effectiveness of using comparable corpora for teaching specialised translation to undergraduate students with a view to increasing the quality of their text production and their translation competence. Our focus was on the translation from English to Spanish of conjunctive adverbs in specialised texts and evaluated whether a heightened awareness of interlinguistic variation helped to improve the students' translation skills and the quality of their translations. For this purpose, we performed an experiment in which undergraduate students were asked to translate a set of English medical text excerpts into Spanish, before and after receiving training sessions during which they learned how to compile, analyse, and exploit 196 Beatriz Sánchez Cárdenas & Pamela Faber comparable specialised corpora using tools such as CORPES 1 and Sketch Engine 2. Special emphasis was placed on the usefulness of these resources for the study of syntax and semantics , particularly in regards to the adverbs of the study: actually and unfortunately in English and their possible equivalents in Spanish, namely, realmente, justamente, precisa-mente, desafortunadamente or desgraciadamente. The study specifically targeted the translation of adverbial modification in the form of logical connectors. In the learning phase of the experiment, students received a detailed explanation of semantic prosody and what it entailed. In the course of this study, students became more sensitive to the problem of translating such words in specialised texts. When the translation produced before and after the training sessions were compared, the results reflected that the students had gained a heightened awareness of the difficulty of translating these apparently simple general language terms and were less apt to choose the conventional dictionary correspondence as a translation solution. Thus, in the second phase of the experiment, there was a significantly greater variation in the translation correspondences found for each adverb. As a result, some of these translations reflected the contextual and pragmatic value of the adverbs rather than their lexicographic equivalence. Resumen Uno de los principales retos de enseñar traducción especializada consiste en ayudar a los estudiantes a tener una percepción global y unitaria del texto. Esta dificultad está estrechamente relacionada con las particularidades de cada lengua-cultura a la hora de estructurar la información en los distintos tipos de texto del discurso científico. La va-riación interlingüística e intertextual influye directamente en el uso de conectores lógicos, la sintaxis, la semántica y la prosodia. Debido a esto, ciertas palabras de la lengua general pueden suponer un obstáculo para el traductor de textos especializados, ya que su uso difiere con respecto a la lengua general. Tal es el caso de muchas parejas de palabras que los diccionarios bilingües dan por equivalentes: however/sin embargo, currently/actualmente and inadequately/inadecuadamente. Si bien esto es cierto en muchas casos de la lengua general, en los textos especializados dicha correspondencia no resulta tan obvia. Aunque el significado nuclear de estos adverbios sea equivalente, las restricciones contextuales que impone cada uno en su lengua respectiva no se corresponden, en gran parte debido a que tienen una prosodia semántica diferente. En este estudio determinamos el grado de efectividad que tiene el uso de corpus compa-rables para la enseñanza de traducción especializada a estudiantes de Grado con el objetivo de aumentar la calidad de su producción textual y competencia traductora. Nos centramos aquí en la traducción del inglés al español de adverbios conjuntivos que aparecen en textos especializados. Evaluamos si el aumento de la conciencia lingüística 1 http://www.rae.es/recursos/banco-de-datos/corpes-xxi 2 http://www.sketchengine.co.uk 197 Corpus analysis and the translation of adverbs in specialised texts desarrollada con el uso de corpus ayudó a los estudiantes a mejorar la calidad de su com-petencia traductora y la calidad de sus traducciones. Para ello, llevamos a cabo un experimento en el que se pidió a los estudiantes que tradujeran una serie de extractos de textos médicos del inglés al español antes y después de haber recibido una formación específica sobre la compilación, análisis y uso de corpus especializados comparables mediante el uso de herramientas tales como CORPES y Sketch Engine. Se prestó especial atención a la utilización de estos recursos para el estudio de la sintaxis y la semántica de los adverbios objeto de este estudio, a saber actually u unfortunately en ingles y sus posibles equivalentes en español, tales como realmente, justamente, precisamente, desafortunadamente o desgraciadamente. En concreto, nos interesamos por la traducción de adverbios que sirven para estructu-rar el discurso de manera lógica. En la fase de formación del experimento, se sensibilizó al estudiantado sobre el concepto de prosodia semántica. A lo largo de estas sesiones, los estudiantes aumentaron su capacidad perceptiva ante los problemas de traducción que conllevan estas palabras en textos especializados. Al comparar las traducciones que hicieron de manera previa y posterior a las sesiones formativas observamos, por un lado, que los estudiantes aumentaron la capacidad para percibir las dificultades que entraña la traducción de estas palabras, que tan sencillas resultan a primera vista. Por otro lado, los resultados muestran que los estudiantes aprendieron a valorar mejor la pertinencia de las traducciones propuestas en los diccionarios, dado que en la segunda fase del experimento, observamos un aumento de la cantidad y variación de traducciones propuestas para cada adverbio. Como consecuencia, algunas de las traducciones reflejaban los valores contex-tuales y pragmáticos de los adverbios más que su equivalencia lexicográfica. Palabras clave: traducción especializada, análisis de corpus, conciencia lingüística, corpus comparable, enseñanza y aprendizaje de la traducción.
Juan Carlos Gil-Berrozpe
added 2 research items
Hyponymy or type_of relation is the backbone of all hierarchical semantic configurations. Although recent work has focused on other relations such as meronymy and causality, hyponymy maintains its special status since it implies property inheritance. As reflected in EcoLexicon, a multilingual terminological knowledge base on the environment, conceptual relations are a key factor in the design of an internally and externally coherent concept system. Terminological knowledge bases can strengthen their coherence and dynamicity when the set of conceptual relations is wider than the typical generic-specific and part-whole relations, which entails refining both the hyponymy and meronymy relations. This paper analyzes how hyponymy is built in the EcoLexicon knowledge base and discusses the problems that can ensue when the type_of relation is too simplistically defined or systematically represented. As a solution, this paper proposes the following: (i) the correction of property inheritance; (ii) the specification of different subtypes of hyponymy; (iii) the creation of 'umbrella concepts'. This paper focuses on the first two solutions and proposes a set of parameters that can be used to decompose hyponymy.
Multidimensionality is the phenomenon by which the characteristics of a certain concept may vary depending on the perspective taken. With no doubt, the representation of multidimensionality is a major challenge in the design of terminological knowledge bases (TKBs), since extracting a few concepts and establishing simple relations between them results in monodimensional systems. EcoLexicon, based on Frame-Based Terminology (FBT), is a multidimensional and dynamic TKB on environmental science that targets user knowledge acquisition through linguistic, conceptual and graphical information. Despite all the advantages that EcoLexicon provides, its vast amount of information is a double-edged sword that occasionally affects its representation of multidimensional knowledge, causing problems that include information overload, excessive noise and redundancy, and transitivity inconsistencies in conceptual relations. To solve these problems, this final degree project proposes an extension of the conceptual systems in EcoLexicon by refining hyponymy in three ways: (i) correcting property inheritance, (ii) implementing umbrella concepts, and (iii) establishing hyponymy subtypes. Moreover, this project also carries out a process of hyponymic extension through corpus extraction to semi-automatically retrieve hyponyms from the EcoLexicon database using customized word sketches, and subsequently complementing and validating the new hierarchies.
Pamela Faber
added 2 research items
Dynamicity is the condition of being in motion, and thus, is characterized by continuous change, activity, or progress. Not surprisingly, dynamicity is generally acknowledged to be an important part of any kind of knowledge representation system or knowledge acquisition scenario. This means that it might be a good idea to reconsider concept representations in Terminology, and modify them so that they better reflect the nature of conceptualization in the mind and brain. In this sense, recent theories of cognition have emphasized that situated or grounded experiences are activated in cognitive processing (Louwerse and Jeuniaux 2010; Barsalou 1999; Zwaan 2003). According to these theories, meaning construction heavily relies on perceptually simulating the information that is presented to the comprehender. Specialized knowledge representation that facilitates knowledge acquisition could thus be conceived as a situation model or event that enables comprehenders to use communicated information to better interact with the world
Terminology work involves the collection, analysis and distribution of terms. This is essential for a wide range of activities, such as technical writing and communication, knowledge acquisition, specialized translation, knowledge resource development and information retrieval. However, these activities cannot be performed randomly, but should be based on a systematic set of theoretical principles that reflect the cognitive and linguistic nature of terms as access points to larger knowledge configurations. “Frame-Based Terminology” (FBT) is a cognitive approach to terminology that is based on frame-like representations in the form of conceptual templates underlying the knowledge encoded in specialized texts (Faber 2011, 21; 2012; Faber et al. 2007, 42). FBT frames can be regarded as situated knowledge structures and are linguistically reflected in the lexical relations codified in terminographic definitions. These frames are the context in which FBT specifies the semantic, syntactic and pragmatic behaviour of specialised language units. They are based on the following set of micro-theories: (1) a semantic micro-theory; (2) a syntactic micro-theory and (3) a pragmatic micro-theory. Each micro-theory is related to the information encoded in term entries, the relations between specialised knowledge units and the concepts that they designate. Keywords: Terminology theory; Cognitive semantics; Concept modelling; Frames
Antonio San Martín
added a research item
Noun compounds (NCs) are semantically complex and not fully compositional, as is often assumed. This paper presents a pilot study regarding the semantic annotation of environmental NCs with a view to accessing their semantics and exploring their domain-based contextual variation. Our results showed that the semantic annotation of NCs afforded important insights into how context impacts their con-ceptualization.
Pamela Faber
added 4 research items
EcoLexicon is a multilingual terminological knowledge base (TKB) on the environment, which provides an internally coherent information system covering a wide range of specialized linguistic and conceptual needs. Our research has mainly focused on conceptual modeling with a view to offering a user-friendly multimodal interface. The dynamic interface of EcoLexicon combines conceptual, linguistic, and graphical information and is primarily hosted in a relational database that has been recently linked to an ontology. One of the main challenges that we have faced in the development of our TKB is the information overload generated by the specialized domain. This is not only due to the wide scope and applicability of environmental concepts, but especially to the fact that multiple dimensions of their meaning definition or conceptual description are not always compatible but
Brain-imaging techniques can be applied in specialized language research to provide insights into how specialized concepts are represented, and processed in the brain. The fMRI study described in this paper focused on general and specialized lexical units and the perception of semantic meaning by expert geologists and non-geologists. The subjects performed semantic matching tasks and made decisions in regards to generallanguage words and specialized terms designating specialized tools and familiar household utensils. The linguistic processing of specialized terms was found to be modulated by the individual’s previous experience with the objects. These results strengthen the hypothesis that when performing a domain-specific task, experts activate different brain systems from novices. This provides data regarding which brain systems are involved in cognitive processes.
Though instrumental in numerous disciplines, context has no universally accepted definition. In specialized knowledge resources it is timely and necessary to parameterize context with a view to more effectively facilitating knowledge representation, understanding, and acquisition, the main aims of terminological knowledge bases. This entails distinguishing different types of context as well as how they interact with each other. This is not a simple objective to achieve despite the fact that specialized discourse does not have as many contextual variables as those in general language (i.e. figurative meaning, irony, etc.). Even in specialized text, context is an extremely complex concept. In fact, contextual information can be specified in terms of scope or according to the type of information conveyed. It can be a textual excerpt or a whole document; a pragmatic convention or a whole culture; a concrete situation or a prototypical scenario. Although these versions of context are useful for the users of terminological resources, such resources rarely support context modeling. In this paper we propose a taxonomy of context primarily based on scope (local and global) and further divided into syntactic, semantic and pragmatic facets. These facets cover the specification of different types of terminological information, such as predicate-argument structure, collocations, semantic relations, term variants, grammatical and lexical cohesion, communicative situations, subject fields and cultures.
Pilar León Araúz
added 2 research items
Despite advances in computer technology, terminologists still tend to rely on manual work to extract all the semantic information that they need for the description of specialized concepts. In this paper we propose the creation of new word sketches in Sketch Engine for the extraction of semantic relations. Following a pattern-based approach, new sketch grammars are developed in order to extract some of the most common semantic relations used in the field of terminology: generic-specific, part-whole, location, cause and function.
The multimodal knowledge base EcoLexicon includes images to enrich conceptual description and enhance knowledge acquisition. These images have been selected according to the conceptual propositions contained in the definitional templates of concepts. Although this ensures coherence and systematic selection, the images are related to each specific concept and are not annotated according to other possible conceptual propositions contained in the image. Our aim is to create a separate repository for images, annotate all knowledge contained in each one of them and then link them to all concept entries that contain one or more of these propositions in their definitional template. This would not only improve the internal coherence of EcoLexicon but it would also improve the reusability of the selected images and avoid duplicating workload. The first step in this process and the objective of the research here described is to evaluate the images already contained in EcoLexicon to see if they were adequately selected in the first place, how knowledge is conveyed through the morphological features of the image and if they can be reused for other concept entries. This analysis has provided preliminary data to further explore how concept type, conceptual relations, and propositions affect the relation between morphological features and image types chosen for visual knowledge representation.
Antonio San Martín
added 2 research items
Las definiciones son uno de los componentes más importantes de cualquier recurso terminológico de calidad y un modo privilegiado de representar el conocimiento, pues ofrece una explicación directa en lenguaje natural del contenido de un concepto. La adecuación de las definiciones determinará en gran medida la utilidad global del recurso para el usuario. La motivación de este estudio parte de la observación de que a menudo las definiciones terminológicas no satisfacen las necesidades de los usuarios. En esta tesis doctoral, aplicamos premisas de la lingüística cognitiva (Lakoff 1987; Langacker 1987; Croft y Cruse 2004; Evans y Green 2006, inter alia) a la definición terminológica y presentamos una propuesta que se denomina la definición terminológica flexible. Consiste en un sistema de definiciones del mismo concepto compuesto por una definición general —en nuestro caso, que engloba el dominio del medio ambiente al completo— junto con definiciones adicionales en las que se describe el concepto específicamente desde el punto de vista de los distintos subdominios en los que el concepto es relevante. Dentro de la lingüística cognitiva ., nuestra propuesta se entronca principalmente en la teoría de la terminología basada en marcos (Faber et al. 2006, 2009; León Araúz 2009; Faber 2012, 2014), así como en las teorías de la cognición fundamentada (Barsalou 1993, 1999, 2003), la semántica de marcos (Fillmore 1976, 1977, 1982, Fillmore y Atkins 1992), la teoría de los prototipos (Rosch 1975; Rosch 1978; Rosch y Mervis 1975; Rosch et al. 1976) y la teoría de la teoría (Murphy y Medin 1985; Murphy 1993, 2000). Dado que la lingüística cognitiva demuestra que el contexto es un factor determinante en la construcción del significado de cualquier unidad léxica, incluidas las terminológicas, asumimos que la definición terminológica puede y debe reflejar los efectos del contexto, a pesar de que tradicionalmente la definición se haya entendido como la expresión del significado despojado de los efectos del contexto. El objetivo principal de esta tesis doctoral es analizar los efectos de la variación contextual en conceptos especializados del medio ambiente con vistas a su representación en la definición terminológica. En particular, nos concentramos en la variación contextual basada en restricciones temáticas. Esto es, en cómo las distintas áreas de conocimiento que forman el vasto dominio del medio ambiente conceptualizan de manera diferente los mismos conceptos y cómo ello puede reflejarse en la definición. Para alcanzar los objetivos de esta tesis doctoral, se llevó a cabo un estudio empírico consistente en el análisis de un conjunto de conceptos que varían contextualmente y la elaboración de la definición flexible de dos de ellos, cada uno de los cuales presentaba características contextuales diferentes. Como resultado de la primera parte de nuestro estudio empírico, dividimos nuestra noción de variación contextual dependiente del dominio en tres fenómenos diferentes (inspirados en Cruse [2011]): la modulación, la perspectivación y la subconceptualización. Todos los conceptos experimentan modulación, algunos también se perspectivizan y finalmente, un pequeño número de conceptos experimenta subconceptualización. En la segunda parte, aplicamos estas nociones a la definición terminológica y mostramos cómo construir definiciones flexibles desde la extracción del conocimiento hasta la redacción de la definición en sí. Esta tesis doctoral contribuye a la mejora de la calidad de las definiciones terminológicas porque, con nuestro enfoque, se proporciona al usuario una definición adaptada al dominio de su elección, multiplicando así las probabilidades de que la definición le ofrezca la información que necesita. Además, las definiciones terminológicas flexibles proporcionan una representación del conocimiento que se asemeja al sistema conceptual humano más que las definiciones tradicionales. Así pues, una definición flexible no solo proporciona información más relevante, sino que también lo logra de una manera que facilita y mejora potencialmente la adquisición de conocimiento.
This paper presents a pilot study that tested whether a list generated by the term extractor TermoStat Web 3.0 from a corpus of KWIC (Key Word in Context) concordance lines for the term designating a given concept to be defined can be an effective source of definitional information. For this purpose, a term list generated from an English corpus of specialized environmental definitions of the concept MAGMA was used as a reference. In order to minimize the interference from terminological variation, the terms in the reference list were categorized according to their conceptual relation with MAGMA. This was easily accomplished since TermoStat Web 3.0 allows the user to consult the concordances of a term on the list so as to determine their conceptual relation with the concept to define. Afterwards, the reference list was compared to the term lists generated from five corpora of KWIC concordance lines of magma with a different number of characters to determine the best length for the context, based on the precision and recall ratio. This also permitted us to derive preliminary conclusions regarding the usefulness of KWIC corpora. The results indicate that a 250-character KWIC corpus coupled with a term list generated from it could be a useful tool for the formulation of definitions, either as a complement to a definitions corpus or as a substitute for it when one is not available.