Usama Malik

University of New South Wales, Kensington, New South Wales, Australia

Are you Usama Malik?

Claim your profile

Publications (13)0 Total impact

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a complete design of a reconfigurable architecture support system, called ACS (an addressless configuration support), which provides efficient access to non-contiguous reconfigurable locations in reconfigurable systems. ACS reduces the amount of partial reconfiguration information required by removing a large amount of addressing information and padding as found in Virtex-4 bitstreams. ACS improves significantly on the distTree architecture previously proposed by us. ACS introduces the selector block which connects the leaf nodes to a consecutive block of reconfiguration locations called a frame set. The system allows any number of leaf nodes customised to the size of the device, thereby providing much more flexibility. The hardware costs have also been reduced significantly over the distTree design. Together with the new marker loading mechanism, ACS is readily applicable to SRAM-based FPGAs. This new ACS system is benchmarked using eight real-world applications against a Virtex-4 device and the results show 6.83%-15.07% speedups when the reconfiguration granularity is set to a Virtex-4 frame.
    ICECE Technology, 2008. FPT 2008. International Conference on; 01/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes a Virtex-4 based system for aligning the code-phase of a received GPS signal. The core operation involves a multiplication of the received signal with a local replica of the code followed by integration of all possible alignments within the period of one code epoch. A speedup proportional to the code length is thus achieved. We outline the proposed system, which stores the code in on-chip memory blocks and uses both dedicated DSP hardware and user logic to perform MAC operations in parallel. We study area, time and energy usage as the ratio of user logic to custom blocks is varied and identify a design point and corresponding device size for which energy usage is minimized.
    01/2007;
  • Source
    U. Malik, O. Diessel
    [Show abstract] [Hide abstract]
    ABSTRACT: In line with Shannon's ideas, we define the entropy of FPGA reconfiguration to be the amount of information needed to configure a given circuit onto a given device. We propose using entropy as a gauge of the maximum configuration compression that can be achieved and determine the entropy of a set of 24 benchmark circuits for the Virtex device family. We demonstrate that simple off-the-shelf compression techniques such as Golomb encoding and hierarchical vector compression achieve compression results that are within 1-10% of the theoretical bound. We present an enhanced configuration memory system based on the hierarchical vector compression technique that accelerates reconfiguration in proportion to the amount of compression achieved. The proposed system demands little additional chip area and can be clocked at the same rate as the Virtex configuration clock
    Field Programmable Logic and Applications, 2006. FPL '06. International Conference on; 09/2006
  • Source
    U. Malik, O. Diessef
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents a configuration memory architecture that offers fast FPGA reconfiguration. The underlying principle behind the design is the use of fine-grained partial reconfiguration that allows significant configuration re-use while switching from one circuit to another. The proposed configuration memory works by reading on-chip configuration data into a buffer, modifying them based on the externally supplied data and writing them back to their original registers. A prototype implementation of the proposed design in a 90nm cell library indicates that the new memory adds less than 1% area to a commercially available FPGA implemented using the same library. The proposed design reduces the reconfiguration time for a wide set of benchmark circuits by 63%. However, power consumption during reconfiguration increases by a factor of 2.5 because the read-modify-write strategy results in more switching in the memory array.
    Field Programmable Logic and Applications, 2005. International Conference on; 09/2005
  • Source
    Usama Malik, Oliver Diessel
    05/2005;
  • Source
    U. Malik, O. Diessel
    [Show abstract] [Hide abstract]
    ABSTRACT: Dynamic FPGA reconfiguration represents an overhead that can be critical to the performance of a realised circuit. To address this problem, This work presents a technique that is applicable at the times of loading the configuration data on the device. The technique involves reusing the on-chip configuration fragments to implement the next configuration thereby reducing the amount of data that must be externally transferred to the configuration memory. This work provides an analysis of the effect of circuit placement and configuration granularity on configuration reuse. The problem of finding placements of each circuit in a sequence of circuits so as to maximize configuration re-use is considered in detail. A greedy solution to this NP complete problem was found to reduce configuration overheads by less than 5% for a benchmark set. The effect of configuration granularity on configuration reuse was also considered and it was found that reducing the size of the unit of configuration allowed us to reduce the size of the benchmark configurations by 41%.
    Field-Programmable Technology, 2004. Proceedings. 2004 IEEE International Conference on; 01/2005
  • Source
    Marco Torre, Usama Malik, Oliver Diessel
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents an investigation and design of an en- hanced on-chip configuration memory system that can reduce the time to (re)configure an FPGA. The proposed system accepts configuration data in a compressed form and performs decompression internally. The resulting FPGA can be (re)configured in time proportional to the size of the compressed bit-stream. The compression technique exploits the redundancy present in typical configuration data. An analysis of config- urations corresponding to a set of benchmark circuits reveals that data that controls the same types of configurable elements have a common byte that occurs at a significantly higher frequency. This common byte is simply broadcast to all instances of that element. This step is followed by byte updates if required. The new configuration system has modest hardware requirements and was observed to reduce reconfiguration time for the benchmark set by two-thirds on average.
    Advances in Computer Systems Architecture, 10th Asia-Pacific Conference, ACSAC 2005, Singapore, October 24-26, 2005, Proceedings; 01/2005
  • Source
    U. Malik, K. So, O. Diessel
    [Show abstract] [Hide abstract]
    ABSTRACT: The Circal process algebra is being used to explore the behavioural specification of systems that are mapped to field programmable logic circuits. In this paper we report on the implementation and performance of an interpreter for system specifications given in the Circal language. In contrast to the typical design flow for field programmable technology in which designs are statically partitioned, synthesised, and mapped to pre-allocated resources, in this system the specified circuits are extracted from behavioural specifications that are partitioned, elaborated, mapped, and configured at run time as control passes through them. We report on the details of a design that targets the Celoxica RC1000 co-processor and assess preliminary performance results for this implementation. The results clearly demonstrate our method is a practical approach to overcome resource constraints, particularly in applications where these change at run time. The results also establish a benchmark against which to measure future improvements and alternative methods.
    Field-Programmable Technology, 2002. (FPT). Proceedings. 2002 IEEE International Conference on; 01/2003
  • Source
    Oliver Diessel, Usama Malik
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper describes the design of an interpreter that overcomes FPGA resource limitations for a class of control- oriented circuits by automatically partitioning, elaborating, and loading circuit components as directed by their execu- tion. By providing a virtual hardware management facility, this enables us to implement large systems, specified in Cir- cal, on small FPGA chips.
    16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 15-19 April 2002, Fort Lauderdale, FL, USA, CD-ROM/Abstracts Proceedings; 01/2002
  • Source
    Oliver Diessel, Usama Malik, Keith So
    [Show abstract] [Hide abstract]
    ABSTRACT: Current FPGA design flows do not readily support high-level, behavioural design or the use of run-time reconfiguration. Designers are thus discouraged from taking a high-level view of their systems and cannot fully exploit the benefits of programmable hardware. This paper reports on our advances towards the development of design technology that supports behavioural specification and compilation of FPGA designs and automatically manages FPGA chip virtualization.
    Euro-Par 2002, Parallel Processing, 8th International Euro-Par Conference Paderborn, Germany, August 27-30, 2002, Proceedings; 01/2002
  • Source
    Oliver Diessel, Usama Malik, Keith So
    [Show abstract] [Hide abstract]
    ABSTRACT: Current FPGA design flows do not readily support high-level, behavioural design or the use of run-time reconfiguration. Designers are thus discouraged from taking a high-level view of their systems and cannot fully exploit the benefits of programmable hardware. This paper reports on our advances towards the development of design technology that supports behavioural specification and compilation of FPGA designs and automatically manages FPGA chip virtualization.
    01/2002;
  • Source
    Usama Malik
  • Usama Malik, Oliver Diessel
    [Show abstract] [Hide abstract]
    ABSTRACT: Abstract This report presents a conguration,memory,architecture that of- fers fast FPGA reconguration. The underlying principle behind the design is the use of ne-grained partial reconguration that allows signican t conguration re-use while switching from one circuit to an- other. The proposed conguration,memory,works by reading on-chip conguration data into a buer, modifying them based on the exter- nally supplied data and writing them back to their original registers. A prototype implementation of the proposed design in a 90nm,cell library indicates that the new memory,adds less than 1% area to a commercially available FPGA implemented using the same library. The proposed design reduces the reconguration time for a wide set of benchmark circuits by 63%. However, power consumption during re- conguration increases by a factor of 2.5 because the read-modify-write strategy results in more switching in the memory,array. TR059 Draft,2

Publication Stats

46 Citations

Top Journals

Institutions

  • 2005–2009
    • University of New South Wales
      • School of Computer Science and Engineering
      Kensington, New South Wales, Australia
  • 2003
    • University of South Wales
      Понтиприте, Wales, United Kingdom