Automatically maintaining navigation sequences for querying semi-structured web sources

Department of Information and Communication Technologies, Facultad de Informatica, Campus de Elviña s/n, University of A Coruña, 15071 A Coruña, Spain
Data & Knowledge Engineering (Impact Factor: 1.12). 12/2007; 63(3):795-810. DOI: 10.1016/j.datak.2007.04.009
Source: DBLP


A substantial subset of Web data has an underlying structure. For instance, the pages obtained in response to a query executed through a Web search form are usually generated by a program that accesses structured data in a local database, and embeds them into an HTML template. For software programs to gain full benefit from these “semi-structured” Web sources, wrapper programs must be built to provide a “machine-readable” view over them. Since Web sources are autonomous, they may experience changes that invalidate the current wrapper, thus automatic maintenance is an important issue. Wrappers must perform two tasks: navigating through Web sites and extracting structured data from HTML pages. While several works have addressed the automatic maintenance of data extraction tasks, the problem of maintaining the navigation sequences remains unaddressed to the best of our knowledge. In this paper, we propose a set of novel techniques to fill this gap.

1 Follower
22 Reads
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Resumen. Actualmente una de las principales fuentes de información es la Web. Lamentablemente la mayoría de sus contenidos están orientados a una interacción humana lo que hace muy difícil su procesamiento automático. Un wrapper es un sistema que simula la interacción humana con la Web e intenta estructurar sus con-tenidos para así facilitar un posterior procesamiento. Como todo proceso automático puede fallar, por lo que es necesario verificar que la información se va extrayendo adecuadamente. En el presente artículo se exponen diversos métodos de verificación así con un anális de las inconvenientes que estos presentan.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents the experiences and the results achieved for delivering sophisticated Business Support Systems to Small and Medium Enterprise (SMEs), using advanced service centric solutions developed in the SECSE project. The SECSE methodology provides concepts meth- ods, processes, and techniques for developing service centric applications, and the supporting toolkit include tools for supporting the principal ac- tivities such as publication, discovery, execution, monitoring, and testing. The paper illustrates the use of the methodology for building a pilot, which delivers to SMEs integrated solutions including computing and communication infrastructure, and complex functionality such as CRM, Work-Flow processing, and logistics. Metrics and evaluation results for assessing both the technical benets and the business benets of the SECSE outcome are also described.
  • [Show abstract] [Hide abstract]
    ABSTRACT: A new software for introducing vector analysis, MaceFields, has been developed and will be soon used as an introductory tool for teaching the basics of propagation and antennas in a continuing education course. The MacFields software has been designed for Macintosh computers by means of an Object Oriented Programming approach. Its aim is to illustrate the basics of the vector analysis operators, gradient, divergence and curl and their combination by allowing the student or the teacher to define fields, apply operators and visualize the results dynamically. It can be used independently or remotely controlled from other applications like HyperCard for training purposes.
    Microwave Conference, 1993. 23rd European; 10/1993
Show more

Alberto Pan