January 2006
145 Reads
1 Citation
The main goal of this paper is to present a first approach to an automatic detection of conceptual relations between two terms in specialised written text. Previous experiments on the basis of the manual analysis lead the authors to implement an automatic query strategy combining the term candidates proposed by an extractor together with a list of verbal syntactic patterns used for the relations refinement. Next step on the research will be the integration of the results into the term extractor in order to attain more restrictive pieces of information directly reused for the ontology building task. In this paper the authors show a strategy planned to obtain specialised knowledge fragments containing terms together with the conceptual relations among them, that is, the skeleton of a text that could be schematised by means of a concept map and hopefully reused in order to enrich a domain dependant ontology. These terms and relations will be detected in written texts from the genomics domain. The results presented in this paper have been obtained for the Catalan language but we are already working on the implementation of the same working methodology for specialised texts in Spanish and English will be also considered in a near future. Roughly speaking, our proposal shows one of the methodologies used for the achievement of conceptual mapping from texts and it includes two different and complementary strategies: On the one hand, we have used a term extractor (YATE) in order to obtain the term candidates in genomics domain texts. YATE has been tuned to cover the working specialised field by means of the enlargement and refinement of some domain dependant information. And, at the same time, the improvement of YATE has contributed to the enlargement of EuroWordNet (a wide- coverage general-purpose lexico-semantic ontology) with new synsets. On the other hand, we have reviewed the traditional conceptual relations classification from the point of view of different (but closely related) disciplines, such as terminology, linguistics, ontologies and lexical semantics. After a validated experiment, we have proposed a closed typology of conceptual relations including seven main types of links that may relate the terms used in any domain, therefore also in genomics. These conceptual relations are reflected, in terms of language, by means of verbal markers usually accompanied with prepositions among other language specific mechanisms not used in our experiments. This patterns have been applied in order to compare the information contained between two different terms and to tag specialised knowledge fragments. In this paper after a brief state-of-the-art about conceptual relations, and the automatic detection strategies of these links, it is included the preliminary analysis of the verbal markers concerning precision and noise. Manually detected patterns from a sample corpus have allowed the authors to explore and implement an automatic query system which has been progressively refined. Some illustrating and relevant contexts are highlighted in the results section indicating some figures concerning precision for each verbal pattern conveying a particular conceptual relation. It is worth mentioning that the integration into YATE of the obtained results using a kwic query interface is described in the future research lines before briefly concluding the paper.