José Luis Bosque

Universidad de Cantabria, Santander, Cantabria, Spain

Are you José Luis Bosque?

Claim your profile

Publications (59)9.22 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: In a Content-based Video Retrieval system, the shot boundary detection is an unavoidable stage. Such a high demanding task needs a deep study from a computational point of view to allow finding suitable optimization strategies. This paper presents different strategies implemented on both a shared-memory symmetric multiprocessor and a Beowulf cluster, and the evaluation of two different programming paradigms: shared-memory and message passing. Several approaches for video segmentation as well as data access are tested in the experiments that also consider load balancing issues.
    The Journal of Supercomputing 04/2013; 64(1). · 0.92 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a new formulation of the isoefficiency function which can be applied to parallel systems executing balanced or unbalanced workloads. This new formulation allows analyzing the scalability of parallel systems under either balanced or unbalanced workloads. Finally, the validity of this new metric is evaluated using some synthetic benchmarks. The experimental results allow assessing the importance of considering the unbalanced workloads while analyzing the scalability of parallel systems.
    The Journal of Supercomputing 04/2013; 64(1). · 0.92 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a load balancing algorithm specifically designed for heterogeneous clusters, composed of nodes with different computational capabilities. The method is based on a new index, which takes into consideration two levels of processors heterogeneity: the number of cores per node and the computational power of each core. The experimental results show that this index allows achieving balanced workload distributions even on those clusters where heterogeneity can not be neglected.
    The Journal of Supercomputing 02/2013; · 0.92 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Many current VLSI on-chip multiprocessors and systems-on-chip employ point-to-point switched interconnection networks. Rings and 2D-meshes are among the most popular interconnection topologies for these increasingly important onchip networks. Nevertheless, rings cannot scale beyond dozens of nodes and meshes are asymmetric. Two of the key features of square 2D-tori are their scalability and symmetry. As higher scalability is demanded by the increasing number of cores (or specialized units) integrated on a chip and symmetry is critical for high-performance and load balancing, we concentrate on 2D-tori. However, most popular deadlock-free routing mechanisms are based on Dimension Order Routing (DOR) which breaks the torus symmetry when managing adversarial traffic patterns. This paper analyzes this problem and its consequences. After that, it proposes a new deadlock-free fully adaptive minimal routing, denoted as σDOR, that preserves tori symmetry under any load. It uses just two virtual channels to avoid DOR-induced asymmetry, the same as in previous competitive proposals. σDOR exhibits better behavior than any of previous solutions as it allows packets to dynamically adapt to local congestion. Experimental results evidence the superior performance of our mechanism, confirming the negative impact of DOR asymmetry.
    Digital System Design (DSD), 2013 Euromicro Conference on; 01/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents an analysis of a Multi-GPU Multi-CPU environment, along with the different possible hybrid combinations. The analysis has been performed for a shot boundary detection application, based on Zernike moments, although it is general enough to be applied to many different application areas. A deep study of the performance, bottlenecks and design challenges is carried out showing the validity of this approach and achieving very high frame per second rates. In this paper, Zernike calculations are carried out on GPUs, taking advantage of a packing strategy proposed to minimize host-device communication time.
    Journal of Parallel and Distributed Computing 09/2012; 72(9):1127-1133. · 1.12 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents the Load Balancing for OpenCL (lbcl) library, devoted to automatically solve load balancing issues on both multi-platform and heterogeneous environments. Using this library, a single kernel can be executed on a set of heterogeneous devices, giving each device an amount of work proportional to its computing power. A wrapper has been developed so the library can balance the workload of an existing application not only without introducing any changes into its source code, but without any recompilation stage. Also a general OpenCL profiler has been developed to easily do a detailed profiling of the obtained results.
    Parallel and Distributed Processing with Applications (ISPA), 2012 IEEE 10th International Symposium on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper analyzes the robustness of the king networks for fault tolerance. To this aim, a performance evaluation of two well known fault tolerant routing algorithms in king as well as 2d networks is done. Immunet that uses two virtual channels and Immucube, that has a better performance while requiring three virtual channels. Experimental results confirm the excellent behavior, both in performance and scalability, of the king topologies in the presence of failures. Finally, taking advantage of the topological features of king networks, a new fault tolerance routing algorithm for these networks is presented. From a cost/performance point of view this algorithm is a compromise between the two previous algorithms.
    High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS), 2012 IEEE 14th International Conference on; 01/2012
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a new expression for an isoefficiency function which can be applied both to homogeneous and heterogeneous systems. Using this new function, called H-isoefficiency, it is now possible to analyze the scalability of heterogeneous clusters. In order to show how this new metric can be used, a theoretical a priori analysis of the scalability of a Gauss Elimination algorithm is presented, together with a model evaluation which demonstrates the correlation between the theoretical analysis and the experimental results.
    The Journal of Supercomputing 01/2011; 58:367-375. · 0.92 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Nowadays TV channels generate a large amount of video data each day. A very huge number of videos of news, shows, series, movies, and so on have to be stored with the aim of being accessed later on. Moreover, channels have a clear need of sharing videos so as to settle a real collaboration among them that minimizes the cost of information acquisition. These features demand a huge storage capacity and a sharing information environment. Both requirements can be solved by using grid computing. It provides both computing and storage capacities to store that great volume of data required, as well as the resource sharing capabilities for the cooperation of different TV channels. This paper presents a video retrieval system that covers these needs and suggests a work allocation (WA) broker to improve the performance of video accesses. An evaluation shows the feasibility and the scalability of this approach showing the benefits of the WA made to store and retrieve large video data. Copyright © 2009 John Wiley & Sons, Ltd.
    Concurrency and Computation Practice and Experience 01/2010; 22:1450-1475. · 0.85 Impact Factor
  • Rafael Menéndez de Llano, José Luis Bosque
    [Show abstract] [Hide abstract]
    ABSTRACT: Artificial Neural Nets are among the most commonly used methods in high-energy applications for data pre-processing. The training phase of the ANN is critical in obtaining a net that can generalize the available data for use in new situations. However, from the computational viewpoint this phase is very costly and resource intensive. Therefore, the aim of this work is to parallelize and evaluate the performance and scalability of the kernel of a training algorithm of a multilayer perceptron artificial neural net used for analyzing data from the Large Electron Positron Collider at CERN. The training methods selected were linear-BFGS and hybrid linear-BFGS. Different approaches for the parallel implementation will be presented and evaluated in this paper. In order to perform a complete performance and scalability evaluation of the proposed approach, three different parallel architectures will be used: A shared memory multiprocessor, a cluster and a grid environment.
    Future Generation Comp. Syst. 01/2010; 26:267-275.
  • Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31 - September 3, 2010, Proceedings, Part II; 01/2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: Grid computing environments are set up mainly to encourage the shared use of different resources based on business/scientific needs. The way these resources are shared in terms of CPU cycles, storage capacity, software licenses etc., is normally dictated by the availability of these resources outside the local administration context. The Semantic Grid is the extension of Grid computing with Semantic Web-based technologies. The Semantic Grid represents grid management data in a machine-understandable format, and reasoning can handle complicated situations in virtual organization management. This paper presents the extension of the collaborative awareness model (CAM) to manage virtual organizations in Semantic Grid environments. CAM applies some theoretical principles of awareness models to promote resource interaction and management, as well as task delivery.
    Future Generation Comp. Syst. 01/2010; 26:276-280.
  • [Show abstract] [Hide abstract]
    ABSTRACT: The SMILE project main aim is to build an efficient low-cost cluster based on FPGA boards in order to take advantage of its reconfigurable capabilities. This paper shows the cluster architecture, describing: the SMILE nodes, the high-speed communication network for the nodes and the software environment. Simulating complex applications can be very hard, therefore a SystemC model of the whole system has been designed to simplify this task and provide error-free downloading and execution of the applications in the cluster. The hardware–software co-design process involved in the architecture and SystemC design is presented as well. The SMILE cluster functionality is tested executing a real complex Content-Based Information Retrieval (CBIR) parallel application and the performance of the cluster is compared (time, power and cost) with a traditional cluster approach.
    Journal of Systems Architecture - Embedded Systems Design. 01/2010; 56:633-640.
  • Pilar Herrero, José Luis Bosque, María S. Pérez
    11/2009: pages 167 - 185; , ISBN: 9780470455432
  • Pilar Herrero, José Luis Bosque, María S. Pérez
    [Show abstract] [Hide abstract]
    ABSTRACT: Cooperation among computational nodes to solve a common parallel application is one of the most outstanding features of grid environments. However, from the performance point of view, it is very important that this cooperation will be made in an equilibrated way, because otherwise some nodes can be overloaded whereas other nodes can be underused. This paper presents CAMBLE (Cooperative Awareness Model for Balancing the Load in grid Environments), which applies some theoretical principles of awareness models to promote an efficient, autonomous, equilibrate and cooperative task delivery in grid environments. This cooperative task management has been implemented and tested in a real and heterogeneous grid infrastructure, composed of several VO, with very successful results. This paper presents some of these outcomes while emphasizes on the overhead and efficacy of the system using this model.
    Multiagent and Grid Systems. 01/2009; 5:267-286.
  • [Show abstract] [Hide abstract]
    ABSTRACT: The use of computational systems to help making the right investment decisions in financial markets is an open research field where multiple efforts have being carried out during the last few years. The ability of improving the assessment process and being faster than the rest of the players is one of the keys for the success on this competitive scenario. This paper explores different options to accelerate the computation of the option pricing problem (supercomputer, FPGA cluster or GPU) using the Montecarlo method to solve the Black-Scholes formula, and presents a quantitative study of their performance and scalability.
    Parallel and Distributed Processing Symposium, International. 01/2009;
  • [Show abstract] [Hide abstract]
    ABSTRACT: The SMILE project accelerates scientific and industrial applications by means of a cluster of low-cost FPGA boards. With this approach the intensive calculation tasks are accelerated using the FPGA logic, while the communication patterns of the applications remains unchanged by using a Message Passing Library over Linux. This paper explains the cluster architecture: the SMILE nodes and the developed high-speed communication network for the FPGA RocketIO interfaces. A SystemC model developed to simulate the cluster is also detailed. In order to show the potential of the SMILE proposal a Content-Based Information Retrieval parallel application has been developed and compared with a HP cluster architecture in terms of response time andpower consumption.
    Field Programmable Logic and Applications, 2008. FPL 2008. International Conference on; 10/2008
  • [Show abstract] [Hide abstract]
    ABSTRACT: The success of different computing models, performance analysis and load balancing and algorithms depends on the processor availability information because there is a strong relationship between a process response time and the processor time available for its execution. Therefore, predicting the processor availability for a new process or task in a computer system is a basic problem that arises in in many important contexts. Unfortunately, making such predictions is not easy because of the dynamic nature of current computer systems and their workload, which can vary drastically in a short interval of time. This paper presents two new availability prediction models. The first, called SPAP (Static Process Assignment Prediction) model, is capable of predicting the CPU availability for a new task on a computer system having information about the tasks in its run queue. The second, called DYPAP (DYnamic Process Assignment Prediction) model, is an improvement of the SPAP model capable of making these predictions from real-time measurements provided by a monitoring tool, without any kind of information about the tasks in the run queue. Furthermore, the implementation of this monitoring tool for Linux workstations is presented.
    IEEE Transactions on Computers 08/2008; 57(7):865-875. · 1.38 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The SMILE project attempts to build efficient lowcost clusters based on FPGA boards using their reconfigurability capabilities. A real parallel application of Content-Based Information Retrieval over the SMILE cluster is presented. Using this application the SMILE cluster¿s performance is evaluated and compared in terms of time and power consumption with traditional cluster architecture.
    Field-Programmable Custom Computing Machines, 2008. FCCM '08. 16th International Symposium on; 05/2008
  • Marta Beltrán, Antonio Guzmán, José Luis Bosque
    [Show abstract] [Hide abstract]
    ABSTRACT: The success of different computing models, performance analysis, and load balancing algorithms depends on the processor availability information because there is a strong relationship between a process response time and the processor time available for its execution. Therefore, predicting the processor availability for a new process or task in a computer system is a basic problem that arises in many important contexts. Unfortunately, making such predictions is not easy because of the dynamic nature of current computer systems and their workload, which can vary drastically in a short interval of time. This paper presents two new availability prediction models. The first, called the SPAP (static process assignment prediction) model, is capable of predicting the CPU availability for a new task on a computer system having information about the tasks in its run queue. The second, called the DYPAP (dynamic process assignment prediction) model, is an improvement of the SPAP model and is capable of making these predictions from real-time measurements provided by a monitoring tool, without any kind of information about the tasks in the run queue. Furthermore, the implementation of this monitoring tool for Linux workstations is presented. In addition, the results of an exhaustive set of experiments are reported to validate these two models and to evaluate the accuracy of their predictions.
    IEEE Trans. Computers. 01/2008; 57:865-875.

Publication Stats

197 Citations
9.22 Total Impact Points

Institutions

  • 2006–2013
    • Universidad de Cantabria
      • • Computers and Electronics
      • • Faculty of Sciences
      Santander, Cantabria, Spain
    • UPM
      Helsinki, Southern Finland Province, Finland
  • 2000–2006
    • King Juan Carlos University
      • Computers Architecture and Technology and Computational Sciences and Artificial Intelligence
      Madrid, Spain
  • 2002–2005
    • Hospital Rey Juan Carlos - Madrid
      Madrid, Madrid, Spain