Table 1. 
SKELETON: Specialised knowledge retrieval on the basis of terms and conceptual relations
January 2006


Judit Feliu



M. Teresa Cabré

The main goal of this paper is to present a first approach to an automatic detection of conceptual relations between two terms in specialised written text. Previous experiments on the basis of the manual analysis lead the authors to implement an automatic query strategy combining the term candidates proposed by an extractor together with a list of verbal syntactic patterns used for the relations refinement. Next step on the research will be the integration of the results into the term extractor in order to attain more restrictive pieces of information directly reused for the ontology building task. In this paper the authors show a strategy planned to obtain specialised knowledge fragments containing terms together with the conceptual relations among them, that is, the skeleton of a text that could be schematised by means of a concept map and hopefully reused in order to enrich a domain dependant ontology. These terms and relations will be detected in written texts from the genomics domain. The results presented in this paper have been obtained for the Catalan language but we are already working on the implementation of the same working methodology for specialised texts in Spanish and English will be also considered in a near future. Roughly speaking, our proposal shows one of the methodologies used for the achievement of conceptual mapping from texts and it includes two different and complementary strategies: On the one hand, we have used a term extractor (YATE) in order to obtain the term candidates in genomics domain texts. YATE has been tuned to cover the working specialised field by means of the enlargement and refinement of some domain dependant information. And, at the same time, the improvement of YATE has contributed to the enlargement of EuroWordNet (a wide- coverage general-purpose lexico-semantic ontology) with new synsets. On the other hand, we have reviewed the traditional conceptual relations classification from the point of view of different (but closely related) disciplines, such as terminology, linguistics, ontologies and lexical semantics. After a validated experiment, we have proposed a closed typology of conceptual relations including seven main types of links that may relate the terms used in any domain, therefore also in genomics. These conceptual relations are reflected, in terms of language, by means of verbal markers usually accompanied with prepositions among other language specific mechanisms not used in our experiments. This patterns have been applied in order to compare the information contained between two different terms and to tag specialised knowledge fragments. In this paper after a brief state-of-the-art about conceptual relations, and the automatic detection strategies of these links, it is included the preliminary analysis of the verbal markers concerning precision and noise. Manually detected patterns from a sample corpus have allowed the authors to explore and implement an automatic query system which has been progressively refined. Some illustrating and relevant contexts are highlighted in the results section indicating some figures concerning precision for each verbal pattern conveying a particular conceptual relation. It is worth mentioning that the integration into YATE of the obtained results using a kwic query interface is described in the future research lines before briefly concluding the paper.


Figure 2. Conceptual relations assignment to the 'cell' concept.  
The GENOMA-KB project: towards the integration of concepts, terms, textual corpora and entities

January 2004


M Teresa Cabré






In the past twenty years much efforts have been devoted to the development of ontologies and term bases for different fields. All this work has been done separately or with slight integration. The GENOMA-KB is a project whose main aim is to integrate, at least, both resources. In this paper, most relevant aspects of the project are presented. Each module is individually described and the links among them are highlighted. Finally, a query system to interrogate the knowledge base is briefly introduced.

Figure 1: Human Genome Knowledge Base Project: an overview
Table 2 indicates the most salient parameters of both resources.
Towards an Ontology for a Human Genome Knowledge Base

January 2002


Ontology, usually understood as a particular representation of a given domain, will become an essential item in the information retrieval system we aim to build. Our research activities are developed on the communicative terminology framework, that is, we mainly deal with units effectively contained in specialized discourse. Bearing in mind this theoretical approach, we consider essential to establish a link between the specialized knowledge units appearing in specialized texts and the concepts organized in a particular ontology. Having the specialized knowledge units closely linked to a conceptual organization will lead us to propose an information retrieval system based on a Human Genome Ontology that should perform better than the current state-of-the-art systems.

Bases cognitivas de la terminología: Hacia una visión comunicativa del concepto

January 2001


0. Introducción Una pregunta que solo muy recientemente se ha empezado a formular es por qué la lingüística no se interesa por el estudio de las llamadas unidades terminológicas cuando de hecho son, a nuestro parecer, unidades de las lenguas. La respuesta hay que encontrarla en tres factores: de un lado, en la percepción que ambos campos de conocimiento han tenido el uno del otro; de otro lado, en la rigidez con que ambos campos han planteado su objeto y establecido sus límites de estudio; y en tercer lugar, en la voluntad de afirmarse como campos independientes. En esta comunicación nos proponemos analizar las razones de la separación y desconocimiento entre la terminología y la lingüística con el fin de mostrar que las llamadas unidades terminológicas son una pieza imprescindible para dar cuenta de la competencia y actuación de los hablantes, con lo que cualquier teoría de las lenguas naturales descriptiva debe integrarlas necesariamente. Si además esta teoría quiere alcanzar valor explicativo tiene que intentar dar respuesta a cómo se interrelacionan los términos y las palabras teniendo en cuenta sus similitudes y al mismo tiempo respetando sus peculiaridades. Para ello solo la selección de una teoría que incluya activamente en todos sus niveles de descripción la semántica y la pragmática y no se limite a los aspectos formales de las lenguas puede garantizar de entrada la integración de términos y palabras y superar, por lo tanto, la separación de la terminología y la lingüística. Serán pues los supuestos de partida de una teoría y sus modos de organización los que faciliten la integración de las unidades terminológicas en el campo de estudio de las lenguas naturales.

Conceptual relations in specialized texts: new typology and an extraction system proposal

Conceptual relations as a basic item in terminology have been traditionally used without taking into account specialized texts. Considering a conceptual relation as an element that links two or more specialized knowledge units in a particular subject field text, it has to be recognised that the number and the diversity of these relations increase. In order to have terminology automatically emerged using relations, it is necessary to define and characterize a complete set of linguistic markers that materialize the different types of relationships.

Figure 1: General structure of the GENOMA-KB 
Figure 2. Extract from ontology tree 
Figure 3. Data sources for the term database 
The GENOMA-KB project: a concept based term enlargement system

The GENOMA-KB knowledge base includes four independent modules: a textual database, a factual database, a terminological database and an ontology. We will briefly introduce in this paper the main features concerning each one of the modules, and we will highlight the process of enlarging both the term base and the ontology.

Ontologies: a review

En aquest paper, analitzem les principals ontologies amb la finalitat de dibuixar un panorama general d'una de les eines més utilitzades en l'estructuració del coneixement. En primer lloc, presentem una àmplia descripció de les cinc ontologies més difoses entre la comunitat científica dedicada a la gestió de la informació. Seguidament, repassem breument algunes de les eines de gestió que s'utilitzen per crear i actualitzar ontologies. I, finalment, presentem algunes conclusions en relació a la selecció d'una ontologia i d'un sistema de gestió per a la seva utilització en el marc dels projectes vigents del grup IULATERM.

