Andres Sanoja

Andres Sanoja
Central University of Venezuela | UCV · Escuela de Computacíon

PhD Computer Science

About

18
Publications
17,853
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
87
Citations
Introduction
I currently work at Central University of Venezuela. I do research in Algorithms, Databases, Web Archives and Distributed Computing. Their current project is "Optimization and Analysis of Web Archives: Web Archive of Venezuela."
Additional affiliations
January 2022 - March 2022
Central University of Venezuela
Position
  • Computer Science Graduate Studies Coordinator
Description
  • Teaching, Administrative Activity
September 2011 - February 2015
Sorbonne Université
Position
  • PhD Student
October 2005 - March 2022
Central University of Venezuela
Position
  • Coordinator of the Paralell and Distributed Systems Centre
Education
September 2011 - January 2015
Sorbonne Université
Field of study
  • Web page segmentation, evaluation an applications
September 2005 - July 2008
Central University of Venezuela
Field of study
  • Web Content Extraction

Publications

Publications (18)
Technical Report
Full-text available
This Technical Report is about the development of a Evaluation Tool defined in a previous PhD thesis. Initially it was intended for evaluating Web pages for very experience users. This tool allows general users to create evaluations, following the predefined evaluation model and metrics.
Article
Full-text available
A Web page segmentation is an important task in Web page analysis. The objective is to divide a Web page into blocks, each one representing a coherent part (or segment) of the content. In this work we describe the development of the Manual-design of Blocks (MoB). At the same time we describe how to get a ground truth of segmentations and how to com...
Technical Report
Full-text available
The main objective of this report is to describe the development of a tool for building a ground truth of manual segmentations of Web pages. It is proposed a model for choosing the "best" segmentation which is a selection of the most popular blocks among a set of segmentations, done by several users. The tool is developed as an extension of the Cho...
Conference Paper
Full-text available
Web archives (and the Web itself) are likely to suffer from format obsolescence. In a few years or decades, future Web browsers will no more be able to properly render Web pages written in HTML4 format. Thus we propose a migration tool from HTML4 to HTML5. This is challenging, because it requires to generate HTML5 semantic elements that do not exis...
Experiment Findings
Full-text available
This repository includes segmentation results for different algorithms, such as : BoM, VIPS, jVIPS, BlockFusion and MIG45. Collections: GOSH and MIG5. This data is used mainly for evaluation. The highlight feature is the geometrics aspect of the segmentation (ie. rectangles), but content information is included as well.
Conference Paper
Full-text available
Web archives are not exempt of format obsolescence. In the near future Web pages written in HTML4 format, could be obsolete. We will have to choose between two preservation strategies: emulation or migration. The first option is the most evident, however due to the size of the Web and the amount of information that Web archives handle it is not pra...
Conference Paper
Full-text available
In this paper, we present a framework for evaluating seg-mentation algorithms for Web pages. Web page segmenta-tion consists in dividing a Web page into coherent fragments, called blocks. Each block represents one distinct information element in the page. We define an evaluation model that includes different metrics to evaluate the quality of a seg...
Article
Full-text available
Web pages are becoming more complex than ever, as they are generated by Content Management Systems (CMS). Thus, analyzing them, i.e. automatically identifying and classifying different elements from Web pages, such as main content, menus, among others, becomes difficult. A solution to this issue is provided by Web page segmentation which refers to...
Article
Full-text available
In this paper we describe Block-o-Matic, a web page segmentation framework. It is a hybrid approach inspired by automated document processing methods and visual-based content segmentation techniques. A web page is associated with three structures: the DOM tree, the content structure and the logical structure. The DOM tree represents the HTML elemen...
Article
Full-text available
Poster presented in the iPRES 2012 conference at the Information Faculty of the University of Toronto, Canada.
Article
Full-text available
The motivation of this work is to provide criteria oriented to the software leaders of the U.C.V Science Faculty for the selection of web technologies for the development of a module for the “Control de Estudios” System (CONEST), proposing to measure them by the use of software metrics. The module was developed in two versions, using different Web...
Article
Full-text available
This article describes the design and implementation of Extratos 1 , a Service Oriented In-formation Extraction System for web content sharing, based on web services as extractors and BPEL business process generation. Some insights from archaeological sciences are applied to the design of the system. It is organized in five subsystems: Xpathula, La...
Article
Full-text available
Este documento se centra en el análisis del gobierno electrónico en Venezuela, partiendo del análisis de las estrategias y lineamientos establecidos en el Plan Nacional de Tecnologías de la Información (Ministerio de Ciencia y Tecnología, 2001) y de los fundamentos, objetivos, principios rectores y bases legales definidas para el Gobierno Electróni...
Article
Full-text available
La tendencia mundial a la transformación del Estado utilizando las TIC existentes es hoy una realidad bajo el nombre de gobierno-e. Latinoamérica no escapa a ello y sus gobiernos están trabajando, en conjunto con organismos multilaterales, para implantar la tecnología y el conocimiento necesarios para llevarlo a cabo. Existen dos tendencias al desa...
Article
Full-text available
El gobierno electrónico es un modelo de desarrollo del estado que consiste en el uso de las Tecnologías de la Información y la Comunicación (TIC) en los procesos internos de gobierno y en los procesos externos de interacción entre el estado y los ciudadanos, para la mejora de los servicios públicos, el fortalecimiento de la responsabilidad administ...

Network

Cited By

Projects

Projects (2)
Project
Research and tools for analyzing and optimizing Web Archives management. Main focus is to study the Web page segmentation algorithms, the Web standards and the change detection of versions. The concrete goal is tow fold. * First, to produce a new stable version of the segmenter BOM and the tools associated: MOB and EOB (ground truth construction and segmentation evaluation, respectively). * Second, a prototype for the Web Archive for limited resources organizations and apply change detection algorithms for analyzing data.
Archived project
SCAPE was an EU-funded project which addressed long term digital preservation of large-scale and heterogeneous collections of digital-objects. SCAPE developed scalable services for preservation planning and preservation actions on an open source platform. These services are based on a framework for automated, quality assured work-flows, which were elaborated and tested during the project runtime. A policy-based preservation planning tool and an automated watch system ensure a secure and targeted implementation of institutional preservation strategies. SCAPE preservation components are able to: * Identify the need to act to preserve all or parts of a repository through characterisation and trend analysis; * Define responses to those needs using formal descriptions of preservation policies and preservation plans; * Allow a high degree of automation, and scalable processing; Monitor the quality of preservation processes.