Antonio González

University of Granada, Granata, Andalusia, Spain

Are you Antonio González?

Claim your profile

Publications (196)21.87 Total impact

  • Enrique Leyva, Yoel Caises, Antonio González, Raúl Pérez
    [show abstract] [hide abstract]
    ABSTRACT: Many authors agree that, when applying instance selection to a data set, it would be useful to characterize the data set in order to choose the most suitable selection criterion. Based on this hypothesis, we propose an architecture for knowledge-based instance selection (KBIS) systems. It uses meta-learning to select the best suited instance selection method for each specific database, among several methods available. We carried out a study in order to verify whether this architecture can outperform the individual methods. Two different versions of a KBIS system based on our architecture, each using a different learner, were instantiated. They were evaluated experimentally and the results were compared to those of the individual methods used.
    Information Sciences. 01/2014; 266:16–30.
  • [show abstract] [hide abstract]
    ABSTRACT: Modern day microprocessors effectively utilise supply voltage scaling for tremendous power reduction. The minimum voltage beyond which a processor cannot operate reliably is defined as V ddmin. On-chip memories like caches are the most susceptible to voltage-noise induced failures because of process variations and reduced noise-margins thereby arbitrating whole processor's V ddmin. In this paper, we evaluate the effectiveness of a new class of hybrid techniques in improving cache yield through failure prevention and correction. Proactive read/write assist techniques like body-biasing (BB) and wordline boosting (WLB) when combined with reactive techniques like ECC and redundancy are shown to offer better quality-energy-area trade offs when compared to their standalone configurations. Proactive techniques can help lower V ddmin (improving functional margin) for significant power savings and reactive techniques ensure that the resulting large number of failures are corrected (improving functional yield). Our results in 22nm technology indicate that at scaled supply voltages, hybrid techniques can improve parametric yield by atleast 28% when considering worst-case process variations.
    International Symposium on Quality Electronic Design; 03/2013
  • Source
    ACM/IEEE International Symposium on Computer Architecture; 01/2013
  • Enrique Leyva, Antonio González, Raúl Pérez
    Knowledge-Based Systems. 01/2013;
  • Rakesh Kumar, Alejandro Martinez, Antonio Gonzalez
    [show abstract] [hide abstract]
    ABSTRACT: Leakage power is a growing concern in current and future microprocessors. Functional units of microprocessors are responsible for a major fraction of this power. Therefore, reducing functional unit leakage has received much attention in the recent years. Power gating is one of the most widely used techniques to minimize leakage energy. Power gating turns off the functional units during the idle periods to reduce the leakage. Therefore, the amount of leakage energy savings is directly proportional to the idle time duration. This paper focuses on increasing the idle interval for the higher SIMD lanes. The applications are profiled dynamically, in a HW/SW co-designed environment, to find the higher SIMD lanes usage pattern. If the higher lanes need to be turned-on for small time periods, the corresponding portion of the code is devectorized to keep the higher lanes off. The devectorized code is executed on the lowest SIMD lane. Our experimental results show average SIMD accelerator energy savings of 12% and 24% relative to power gating, for SPECFP2006 and Physics bench. Moreover, the slowdown caused due to devectorization is less than 1%.
    Computer Architecture and High Performance Computing (SBAC-PAD), 2013 25th International Symposium on; 01/2013
  • David Garc'ia, Antonio González, Raúl Pérez
    [show abstract] [hide abstract]
    ABSTRACT: This paper presents a proposal that introduces the use of feature construction in a fuzzy rule learning algorithm. This is done by means of the combination of two different approaches together with a new learning strategy. The first of these two approaches consists of using relations in the antecedent of fuzzy rules while the second one employs functions in the antecedent of that rules. Thus, the method we propose tries to integrate these two models so that, using a learning strategy that allows us to start learning more general rules and finish the process learning more specific ones, we are able to increase the amount of information extracted from the initial variables. The experimental results show that the proposed method obtains a good trade-off among accuracy, interpretability and time needed to get the model in relation to the rest of algorithms using feature construction involved in the comparison.
    Journal of Computer and System Sciences 01/2013; · 1.00 Impact Factor
  • [show abstract] [hide abstract]
    ABSTRACT: Memory circuits are playing a key role in complex multicore systems with both data and instructions storage and mailbox communication functions. There is a general concern that conventional SRAM cell based on the 6T structure could exhibit serious limitations in future CMOS technologies due to the instability caused by transistor mismatching as well as for leakage consumption reasons. For L1 data caches the new cell 3T1D DRAM is considered a potential candidate to substitute 6T SRAMs. We first evaluate the impact of the positive bias temperature instability, PBTI, on the access and retention time of the 3T1D memory cell implemented with 45 nm technology. Then, we consider all sources of variations and the effect of the degradation caused by the aging of the device on the yield at system level.
    Integration the VLSI Journal 06/2012; 45(3). · 0.41 Impact Factor
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: The exponential increase in multicore processor (CMP) cache sizes accompanied by growing on-chip wire delays make it difficult to implement traditional caches with a single, uniform access latency. Non-Uniform Cache Architecture (NUCA) designs have been proposed to address this problem. A NUCA divides the whole cache memory into smaller banks and allows banks nearer a processor core to have lower access latencies than those further away, thus mitigating the effects of the cache's internal wires. Determining the best placement for data in the NUCA cache at any particular moment during program execution is crucial for exploiting the benefits that this architecture provides. Dynamic NUCA (D-NUCA) allows data to be mapped to multiple banks within the NUCA cache, and then uses data migration to adapt data placement to the program's behavior. Although the standard migration scheme is effective in moving data to its optimal position within the cache, half the hits still occur within non-optimal banks. This paper reduces this number by anticipating data migrations and moving data to the optimal banks in advance of being required. We introduce a prefetcher component to the NUCA cache that predicts the next memory request based on the past. We develop a realistic implementation of this prefetcher and, furthermore, experiment with a perfect prefetcher that always knows where the data resides, in order to evaluate the limits of this approach. We show that using our realistic data prefetching to anticipate data migrations in the NUCA cache can reduce the access latency by 15% on average and achieve performance improvements of up to 17%.
    TACO. 01/2012; 8:45.
  • abhishek deb, Josep Maria Codina, Antonio Gonzalez
    [show abstract] [hide abstract]
    ABSTRACT: In this paper, we propose a novel programmable functional unit (PFU) to accelerate general purpose application execution on a modern out-of-order x86 processor. Code is transformed and instructions are generated that run on the PFU using a co-designed virtual machine (Cd-VM). Results presented in this paper show that this HW/SW co-designed approach produces average speedups in performance of 29% in SPECFP and 19% in SPECINT, and up-to 55%, over modern out-of-order processor.
    IEEE Computer Architecture Letters 01/2012; 11(1):9-12. · 0.85 Impact Factor
  • [show abstract] [hide abstract]
    ABSTRACT: Dynamic Binary Translators (DBT) and Dynamic Binary Optimization (DBO) by software are used widely for several reasons including performance, design simplification and virtualization. However, the software layer in such systems introduces non-negligible overheads which affect performance and user experience. Hence, reducing DBT/DBO overheads is of paramount importance. In addition, reduced overheads have interesting collateral effects in the rest of the software layer, such as allowing optimizations to be applied earlier. A cost-effective solution to this problem is to provide hardware support to speed up the primitives of the software layer, paying special attention to automate DBT/DBO mechanisms and leave the heuristics to the software, which is more flexible. In this work, we have characterized the overheads of a DBO system using DynamoRIO implementing several basic optimizations. We have seen that the computation of the Data Dependence Graph (DDG) accounts for 5%-10% of the execution time. For this reason, we propose to add hardware support for this task in the form of a new functional unit, called DDGacc, which is integrated in a conventional pipeline processor and is operated through new ISA instructions. Our evaluation shows that DDGacc reduces the cost of computing the DDG by 32x, which reduces overall execution time by 5%-10% on average and up to 18% for applications where the DBO optimizes large code footprints.
    ACM SIGPLAN Notices 01/2012; · 0.71 Impact Factor
  • David Garcia, Antonio Gonzalez, Raul Perez
    International Journal of Uncertainty Fuzziness and Knowledge-Based Systems 01/2012; 20(supp02):31-49. · 0.89 Impact Factor
  • Antonio González, Raúl Pérez, Yoel Caises, Enrique Leyva
    [show abstract] [hide abstract]
    ABSTRACT: Fuzzy modelling research has traditionally focused on certain types of fuzzy rules. However, the use of alternative rule models could improve the ability of fuzzy systems to represent a specific problem. In this proposal, an extended fuzzy rule model, that can include relations between variables in the antecedent of rules is presented. Furthermore, a learning algorithm based on the iterative genetic approach which is able to represent the knowledge using this model is proposed as well. On the other hand, potential relations among initial variables imply an exponential growth in the feasible rule search space. Consequently, two filters for detecting relevant potential relations are added to the learning algorithm. These filters allows to decrease the search space complexity and increase the algorithm efficiency. Finally, we also present an experimental study to demonstrate the benefits of using fuzzy relational rules.
    International Journal of Computational Intelligence Systems. 01/2012; 5(2).
  • [show abstract] [hide abstract]
    ABSTRACT: In view of device scaling issues, embedded DRAM (eDRAM) technology is being considered as a strong alternative to conventional SRAM for use in on-chip memories. Memory cells designed using eDRAM technology in addition to being logic-compatible, are variation tolerant and immune to noise present at low supply voltages. However, two major causes of concern are the data retention capability which is worsened by parameter variations leading to frequent data refreshes (resulting in large dynamic power overhead) and the transient reduction of stored charge increasing soft-error (SE) susceptibility. In this paper, we present a novel variation-tolerant 4T-DRAM cell whose power consumption is 20.4% lower when compared to a similar sized eDRAM cell. The retention time on-average is improved by 2.04X while incurring a delay overhead of 3% on the read-access time. Most importantly, using a soft-error (SE) rate analysis tool, we have confirmed that the cell sensitivity to SEs is reduced by 56% on-average in a natural working environment.
    Computer Design (ICCD), 2012 IEEE 30th International Conference on; 01/2012
  • Rakesh Kumar, Alejandro Martínez, Antonio González
    Proceedings of the 21st international conference on Parallel architectures and compilation techniques; 01/2012
  • Proceedings of the 21st ACM Great Lakes Symposium on VLSI 2010, Lausanne, Switzerland, May 2-6, 2011; 01/2011
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: Microprocessor design validation is a time consuming and costly task that tends to be a bottleneck in the release of new architectures. The validation step that detects the vast majority of design bugs is the one that stresses the silicon prototypes by applying huge numbers of random tests. Despite its bug detection capability, this step is constrained by extreme computing needs for random tests simulation to extract the bug-free memory image for comparison with the actual silicon image. We propose a self-checking method that accelerates silicon validation and significantly increases the number of applied random tests to improve bug detection efficiency and reduce time-to-market. Analysis of four major ISAs (ARM, MIPS, PowerPC, and x86) reveals their inherent diversity: more than three quarters of the instructions can be replaced with equivalent instructions. We exploit this property in post-silicon validation and propose a methodology for the generation of random tests that detect bugs by comparing results of equivalent instructions. We support our bug detection method in hardware with a light-weight mechanism which, in case of a mismatch, replays the random test replacing the offending instruction with its equivalent. Our bug detection method and corresponding hardware significantly accelerate the post-silicon validation process. Evaluation of the method on an x86 microprocessor model demonstrates its efficiency over simulation-based and self-checking alternatives, in terms of bug detection capabilities and validation time speedup.
    ACM/IEEE International Symposium on Microarchitecture; 01/2011
  • Source
    Abhishek Deb, Josep M. Codina, Antonio González
    [show abstract] [hide abstract]
    ABSTRACT: In this paper we propose SoftHV, a high-performance HW/SW co-designed in-order processor that performs horizontal and vertical fusion of instructions. SoftHV consists of a co-designed virtual machine (Cd-VM) which reorders, removes and fuses instructions from frequently executed regions of code. On the hardware front, SoftHV implements HW features for efficient execution of Cd-VM and efficient execution of the fused instructions. In particular, (1) Interlock Collapsing ALU (ICALU) are included to execute pairs of dependent simple arithmetic operations in a single cycle, and (2) Vector Load units (VLDU) are added to execute parallel loads. The key novelty of SoftHV resides on the efficient usage of HW using a Cd-VM in order to provide high-performance by drastically cutting down processor complexity. Co-designed processor provides efficient mechanisms to exploit ILP and reduce the latency of certain code sequences. Results presented in this paper show that SoftHV produces average performance improvements of 85% in SPECFP and 52% in SPECINT, and up-to 2.35x, over a conventional four-way in-order processor. For a two-way in-order processor configuration SoftHV obtains improvements in performance of 72% and 47% for SPECFP and SPECINT, respectively. Overall, we show that such a co-designed processor based on an in-order core provides a compelling alternative to out-of-order processors for the low-end domain where high-performance at a low-complexity is a key feature.
    Proceedings of the 8th Conference on Computing Frontiers, 2011, Ischia, Italy, May 3-5, 2011; 01/2011
  • Source
    Javier Lira, Carlos Molina, David Brooks, Antonio González
    [show abstract] [hide abstract]
    ABSTRACT: Advances in technology allowed for integrating DRAM-like structures into the chip, called embedded DRAM (eDRAM). This technology has already been successfully implemented in some GPUs and other graphic-intensive SoC, like game consoles. The most recent processor from IBM®, POWER7, is the first general-purpose processor that integrates an eDRAM module on the chip. In this paper, we propose a hybrid cache architecture that exploits the main features of both memory technologies, speed of SRAM and high density of eDRAM. We demonstrate, that due to the high locality found in emerging applications, a high percentage of data that enters to the on-chip last-level cache are not accessed again before they are evicted. Based on that observation, we propose a placement scheme where re-accessed data blocks are stored in fast, but costly in terms of area and power, SRAM banks, while eDRAM banks store data blo cks that just arrive to the NUCA cache or were demoted from a SRAM bank. We show that a well-balanced SRAM / eDRAM NUCA cache can achieve similar performance results than using a NUCA cache composed of only SRAM banks, but reduces area by 15% and power consumed by 10%. Furthermore, we also explore several alternatives to exploit the area reduction we gain by using the hybrid architecture, resulting in an overall performance improvement of 4%.
    18th International Conference on High Performance Computing, HiPC 2011, Bengaluru, India, December 18-21, 2011; 01/2011
  • Yoel Caises, Antonio González, Enrique Leyva, Raúl Pérez
    [show abstract] [hide abstract]
    ABSTRACT: Although there are several proposals in the instance selection field, none of them consistently outperforms the others over a wide range of domains. In recent years many authors have come to the conclusion that data must be characterized in order to apply the most suitable selection criterion in each case. In light of this hypothesis, herein we propose a set of measures to characterize databases. These measures were used in decision rules which, given their values for a database, select from some pre-selected methods, the method, or combination of methods, that is expected to produce the best results. The rules were extracted based on an empirical analysis of the behaviors of several methods on several data sets, then integrated into an algorithm which was experimentally evaluated over 20 databases and with six different learning paradigms. The results were compared with those of five well-known state-of-the-art methods.
    Inf. Sci. 01/2011; 181:4780-4798.
  • Proceedings of the 2010 International Symposium on Low Power Electronics and Design, 2010, Austin, Texas, USA, August 18-20, 2010; 01/2011

Publication Stats

2k Citations
21.87 Total Impact Points


  • 1996–2014
    • University of Granada
      • Department of Computer Science and Artificial Intelligence
      Granata, Andalusia, Spain
  • 1995–2010
    • Polytechnic University of Catalonia
      • Department of Computer Architecture (DAC)
      Barcino, Catalonia, Spain
  • 2006
    • TOBB University of Economics and Technology
      Engüri, Ankara, Turkey
    • Intel
      Santa Clara, California, United States
  • 2003
    • Universitat Rovira i Virgili
      • Department of Computer Engineering and Mathematics (DEIM)
      Tarraco, Catalonia, Spain
  • 2001
    • University of Murcia
      • Departamento de Ingeniería y Tecnología de Computadores
      Murcia, Murcia, Spain