Conference Paper

An Ontology-Based Index to Retrieve Documents with Geographic Information.

DOI: 10.1007/978-3-540-69497-7_25 Conference: Scientific and Statistical Database Management, 20th International Conference, SSDBM 2008, Hong Kong, China, July 9-11, 2008, Proceedings
Source: DBLP

ABSTRACT Both Geographic Information Systems and Information Retrieval have been very active research fields in the last decades. Lately, a new research field called Geographic Information Retrieval has appeared from the intersection of these two fields. The main goal of this field is to define index structures and techniques
to efficiently store and retrieve documents using both the text and the geographic references contained within the text.

We present in this paper a new index structure that combines an inverted index, a spatial index, and an ontology-based structure.
This structure improves the query capabilities of other proposals. In addition, we describe the architecture of a system for
geographic information retrieval that uses this new index structure. This architecture defines a workflow for the extraction
of the geographic references in the document.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Resumen Dentro del campo de los Sistemas de Información Geográfica (SIG) se está realizando un trabajo muy importante por parte de muchas organizaciones para la construcción de Infraestructuras de Datos Espaciales (IDEs) que les permitan compartir su información geográfica. En estas IDEs, y en los SIG en general, no sólo se gestiona información geográfica sino que también se deben almacenar y recuperar muchos tipos de documentos con texto (licencias de obra, expedientes, etc.). Para proporcionar un acceso eficiente a este tipo de documentos es necesario contar con estructuras de indexación textual sobre dichos documentos. Además, dentro del texto de los documentos aparecen muchas veces referencias geográficas por lo que la estructura de indexación debe permitir también realizar consultas que tengan en cuenta esas referencias geográficas y sus características especiales debidas a su naturaleza espacial. En este trabajo presentamos un proceso de workflow que permite la construcción de un repositorio de documentos al que se puede acceder de manera eficiente realizando consultas acerca tanto del texto de los documentos como de las referencias geográficas citadas en dichos textos. Además, se describe brevemente la estructura de indexación que permite resolver dichas consultas y que combina un índice textual, un índice espacial y una ontología del espacio.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: More people than ever before have access to information with the World Wide Web; information volume and number of users both continue to expand. Traditional search methods based on keywords are not effective, resulting in large lists of documents, many of which unrelated to users’ needs. One way to improve information retrieval is to associate meaning to users’ queries by using ontologies, knowledge bases that encode a set of concepts about one domain and their relationships. Encoding a knowledge base using one single ontology is usual, but a document collection can deal with different domains, each organized into an ontology. This work presents a novel way to represent and organize knowledge, from distinct domains, using multiple ontologies that can be related. The model allows the ontologies, as well as the relationships between concepts from distinct ontologies, to be represented independently. Additionally, fuzzy set theory techniques are employed to deal with knowledge subjectivity and uncertainty. This approach to organize knowledge and an associated query expansion method are integrated into a fuzzy model for information retrieval based on multi-related ontologies. The performance of a search engine using this model is compared with another fuzzy-based approach for information retrieval, and with the Apache Lucene search engine. Experimental results show that this model improves precision and recall measures.
    Knowledge and Information Systems 03/2012; DOI:10.1007/s10115-012-0482-0 · 2.64 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This project aims to generate new techniques and open-source software tools that help in the production and maintenance of large digital libraries and allow for more intensive exploitation of their contents. In particular, software has been implemented to enable the linguistically-enriched exploitation of digital libraries and, to some extent, to improve the transcription quality of digital content and to allow cheaper production of digital texts and metadata. Also, new methods to index and retrieve documents with structural markup (even in compressed form) in large collections have been explored. Additionally, new techniques to the creation and use of educational resources have been tested.

Full-text (2 Sources)

Available from
Jun 4, 2014