V. Carneiro

University of A Coruña, La Corogne, Galicia, Spain

Are you V. Carneiro?

Claim your profile

Publications (23)5.3 Total impact

  • Source
    Victor M. Prieto · Manuel Álvarez · Víctor Carneiro · Fidel Cacheda
    [Show abstract] [Hide abstract]
    ABSTRACT: Search engines use crawlers to traverse the Web in order to download web pages and build their indexes. Maintaining these indexes up-to-date is an essential task to ensure the quality of search results. However, changes in web pages are unpredictable. Identifying the moment when a web page changes as soon as possible and with minimal computational cost is a major challenge. In this article we present theWeb Change Detection system that, in a best case scenario, is capable to detect, almost in real time, when a web page changes. In a worst case scenario, it will require, on average, 12 minutes to detect a change on a low PageRank web site and about one minute on a web site with high PageRank. Meanwhile, current search engines require more than a day, on average, to detect a modification in a web page (in both cases). © 2015, Computer Science and Information Systems. All rights reserved.
    Preview · Article · Jan 2015 · Computer Science and Information Systems
  • Vreixo Formoso · Diego Fernández · Fidel Cacheda · Victor Carneiro
    [Show abstract] [Hide abstract]
    ABSTRACT: Collaborative filtering is one of the most popular recommendation techniques. While the quality of the recommendations has been significantly improved in the last years, most approaches present poor efficiency and scalability. In this paper, we study several factors that affect the performance of a k-Nearest Neighbors algorithm, and we propose a distributed architecture that significantly improves both throughput and response time. Two techniques for distributing recommender systems, user and item partition, were proposed and evaluated using that simulation model. We have found that user partition is generally better, with a faster response time and higher throughput.
    No preview · Article · Jun 2014 · World Wide Web
  • Fidel Cacheda · Víctor Carneiro · Diego Fernández · Vreixo Formoso

    No preview · Article · Feb 2011 · ACM Transactions on the Web
  • F. Cacheda · V. Carneiro · D. Fernández · V. Formoso
    [Show abstract] [Hide abstract]
    ABSTRACT: The performance evaluation of an IR system is a key point in the development of any search engine, and specially in the Web. In order to get the performance we are used to, Web search engines are based on large-scale distributed systems and to optimise its performance is an important aspect in the literature. The main methods, that can be found in the literature, to analyse the performance of a distributed IR system are: the use of an analytical model, a simulation model and a real search engine. When using an analytical or simulation model some details could be missing and this will produce some differences between the real and estimated performance. When using a real system, the results obtained will be more precise but the resources required to build a large-scale search engine are excessive. In this paper we propose to study the performance by building a scaled-down version of a search engine using virtualization tools to create a realistic distributed system. Scaling-down a distributed IR system will maintain the behaviour of the whole system and, at the same time, the computer requirements will be softened. This allows the use of virtualization tools to build a large-scale distributed system using just a small cluster of computers.
    No preview · Article · Jan 2010
  • Source
    Vreixo Formoso · Fidel Cacheda · Víctor Carneiro
    [Show abstract] [Hide abstract]
    ABSTRACT: In this work we present a series of collaborative filtering algorithms known for their simplicity and efficiency. The efficiency of this algorithm was compared with that of other more representative collaborative filtering algorithms. The results demonstrate that the response times are better than those of the rest (at least two orders of magnitude), in the training as well as when making predictions. Furthermore, when determining the quality of the predictions, the behavior of our algorithms is similar to that of the other algorithms, and even better when dealing with low-density training sets.
    Full-text · Article · Jan 2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Most today's web sources do not provide suitable interfaces for software programs to interact with them. Many researchers have proposed highly effective techniques to address this problem. Nevertheless, ad-hoc solutions are still frequent in real-world web automation applica- tions. Arguably, one of the reasons for this situation is that most proposals have focused on query wrappers, which transform a web source into a special kind of database in which some queries can be executed using a query form and return resultsets that are composed of structured data records. Although the query wrapper model is often useful, it is not appropriate for applications that make decisions according to the data retrieved or processes that use forms that can be mod- elled as insert/update/delete operations. This article proposes a new language for defining web automation processes that is based on a wide range of real-world web automation tasks that are being used by corporations from different business areas.
    Preview · Article · Jan 2008 · JOURNAL OF UNIVERSAL COMPUTER SCIENCE
  • [Show abstract] [Hide abstract]
    ABSTRACT: A substantial subset of Web data has an underlying structure. For instance, the pages obtained in response to a query executed through a Web search form are usually generated by a program that accesses structured data in a local database, and embeds them into an HTML template. For software programs to gain full benefit from these “semi-structured” Web sources, wrapper programs must be built to provide a “machine-readable” view over them. Since Web sources are autonomous, they may experience changes that invalidate the current wrapper, thus automatic maintenance is an important issue. Wrappers must perform two tasks: navigating through Web sites and extracting structured data from HTML pages. While several works have addressed the automatic maintenance of data extraction tasks, the problem of maintaining the navigation sequences remains unaddressed to the best of our knowledge. In this paper, we propose a set of novel techniques to fill this gap.
    No preview · Article · Dec 2007 · Data & Knowledge Engineering
  • Vreixo Formoso · Fidel Cacheda · Victor Carneiro · Juan Valino
    [Show abstract] [Hide abstract]
    ABSTRACT: Although there are quite a few Open Source monitoring applications, they have not reached yet the necessary maturity level. Many users have to face important problems when deploying a monitoring system for their networks. In this paper we compare the most popular open source monitoring tools, and we analyze their main limitations. As a solution for these problems we propose a new monitoring tool, that incorporates several outstanding improves, such as a centralized configuration via web, support for monitoring templates, a hierarchical structure of objects to handle the management information, and support for centralized and distributed monitoring schemes. We describe in detail its architecture, and show its use in a real environment, which makes it possible to verify the importance of the improvements that have been developed.
    No preview · Conference Paper · Oct 2007
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The crawler engines of today cannot reach most of the information contained in the Web. A great amount of valuable information is "hidden" behind the query forms of online databases, and/or is dynamically generated by technologies such as JavaScript. This portion of the web is usually known as the Deep Web or the Hidden Web. We have built DeepBot, a prototype hidden- web crawler able to access such content. DeepBot receives as input a set of domain definitions, each one describing a specific data-collecting task and automatically identifies and learns to execute queries on the forms relevant to them. In this paper we describe the techniques employed for building DeepBot and report the experimental results obtained when testing it with several real world data collection tasks.
    Full-text · Conference Paper · Aug 2007
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The crawler engines of today cannot reach most of the information contained in the Web. A great amount of valuable information is "hidden" behind the query forms of online databases, and/or is dynamically generated by technologies such as Javascript. This portion of the web is usually known as the Deep Web or the Hidden Web. We have built DeepBot, a prototype of hidden-web focused crawler able to access such content. DeepBot receives a set of domain definitions as an input, each one describing a specific data-collecting task and automatically identifies and learns to execute queries on the forms relevant to them. In this paper we describe the techniques employed for building DeepBot and report the experimental results obtained when testing it with several real world data collection tasks.
    Full-text · Conference Paper · Jan 2007
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Wrapping existing Web applications into portals allows to protect investment and improves user experience. Most current portlet-based portal servers provide a bridge portlet that allows to "portletize" a single Web page, that is, wrapping the whole page or a set of regions as a portlet. They use an annotation-based approach to specifying the page's regions that must be extracted. This approach does not scale well when a whole application is to be portletized, since it requires to manually annotate each page. This paper describes the design of a bridge portlet that automatically adapts pages according to the space available in the portlet's window. The bridge portlet delegates page adaptation to a framework that uses a chain of user-configurable "transformers". Each transformer implements an automatic page adaptation technique. Experiments show that our approach is effective.
    Full-text · Conference Paper · Dec 2006
  • Source
    Francisco Puentes · Victor Carneiro
    [Show abstract] [Hide abstract]
    ABSTRACT: The present document describes the VAIN architecture (Virtual Active IP Node), which enables users to deploy new network services based on virtual active networks, and how it solves the challenge of segmenting the incoming traffic that crosses nodes towards the services, conserving the original objective of independence of the protocol (Tennenhouse,1996). Our solution is based on using network expressions that use all the semantic contained in each incoming packet, which does not need to know the inner structure of the protocols. VAIN architecture has been development to response to challenges outlined by electronic commerce, specifically those regarding to collaborative environments and marketplaces. To achieve this objective we have considered the following goals: first, a three layer conceptualization; second, a transparent implantation and its integration with existing infrastructures; and third, a strategy of network traffic distribution based in all the information within the input packets, which is named "expressions based distribution".
    Full-text · Article · Jan 2005 · International Journal of Web Based Communities
  • Source
    F Cacheda · V. Carneiro · V Plachouras · I. Ounis
    [Show abstract] [Hide abstract]
    ABSTRACT: In this study, we present the analysis of the interconnection network of a distributed Information Retrieval (IR) system, by simulating a switched network versus a shared access network. The results show that the use of a switched network improves the performance, especially in a replicated system because the switched network prevents the saturation of the network, particu- larly when using a large number of query servers.
    Preview · Article · Jan 2005 · Lecture Notes in Computer Science
  • F. Puentes · V. Carneiro
    [Show abstract] [Hide abstract]
    ABSTRACT: Modern communication infrastructures are formed by devices which are able to process the traffic that crosses them. In this context an important work inside each active device is to classify the incoming traffic and send it towards its service. We present a report in advance regarding other implementations which is able to perform an independent packet classification with an excellent performance using runtime code generation techniques. This classifier is being used inside VAIN (virtual active independent node) our platform to build active networks. We also introduce some aspects about VAIN to understand how ITCAN performs its tasks in this environment
    No preview · Conference Paper · Jul 2004
  • F. Puentes · F. Cacheda · V. Carneiro
    [Show abstract] [Hide abstract]
    ABSTRACT: This report will describe the VAIN (Virtual Active IP Network) architecture development to response challenges outlined by electronic commerce, specifically at the collaborative environments and marketplaces. For this development we have considered following goals: a three layer conceptualization, a transparent implantation and its integration with existing infrastructures; and a strategy of network traffic distribution based on whole information from input packets, by means of the named "patterns based distribution". Mainly VAIN uses as guest code an interpreter of intermediate code from .NET architecture, although the possibility is open to use other intermediate codes. VAIN is immediately over link layer, being able to be extended to any other similar protocol, and independent of upper protocols existing or not at the present time. Our architecture presents, also, a polymorphic character since it allows changing its behavior in a transparent way and virtually emulating other architectures concurrently without affect its functionality.
    No preview · Article · Jan 2003
  • J. Arribi · V. Carneiro

    No preview · Conference Paper · Jan 2000
  • J. Arribi · V. Carneiro

    No preview · Conference Paper · Jan 1999
  • V. Carneiro · A. Vina · C. Guerrero
    [Show abstract] [Hide abstract]
    ABSTRACT: We introduce an alarm management system based on management by delegation paradigm (MbD) which provides the operator with an integrated and homogeneous environment in which different types of alarms exist. The platform chosen was Java owing to its special features (code mobility, platform independency, distributed capabilities, etc.). This system provides the programmer with a flexible, modular and robust environment where the functionality of the system can be increased dynamically without having to alter any part of it. This system overcomes most of the limitations inherent to centralised systems. Some of the key characteristics of the system are: protocol integration, the use of an RDBMS to enhance the information about alarms, multi-user monitoring through an intuitive GUI applet with several permission constraints
    No preview · Conference Paper · Feb 1998
  • C. Guerrero · D. Sanchez · V. Carneiro · A. Vina · J. Coego
    [Show abstract] [Hide abstract]
    ABSTRACT: We address the problem of integrating proprietary managed technology in a corporate TMN system by using TMN-based platforms support facilities. A prototype that integrates a proprietary managed PDH network in a fully TMN corporate management system has been designed and developed using three different TMN platforms in parallel: Solstice Enterprise Manager, NetView/6000 TMN Support Facility and OpenView DM. This experimental prototype helped us (1) to understand how the new emerging management platforms support the engineering of solutions to integrate proprietary protocols and (2) to identify potential problems that can arise when trying to apply the platform functionality to the real network elements
    No preview · Conference Paper · Jul 1997
  • F. Munoz-Mansilla · J. Sanchez · V. Carneiro · J. Coego

    No preview · Conference Paper · Jan 1997