Conference Paper

Adaptive FPGAs: High-Level Architecture and a Synthesis Method

To read the full-text of this research, you can request a copy directly from the authors.


This paper presents preliminary work exploring adaptive field programmable gate arrays (AFPGAs). An AFPGA is adaptive in the sense that the functionality of subcircuits placed on the chip can change in response to changes observed on certain control signals. We describe the high-level architecture which adds additional control logic and SRAM bits to a traditional FPGA to produce an AFPGA. We also describe a synthesis method that identifies and resynthesizes mutually exclusive pieces of logic so that they may share the resources available in an AFPGA. The architectural feature and its associated synthesis method helps reduce circuit size by 28% on average and up to 40% on select circuits

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
In order to accelerate an algorithm for test generation, it is necessary to reduce the number of backtracks in the algorithm and to shorten the process time between backtracks. In this paper we consider several techniques to accelerate test generation and present a new test generation algorithm called FAN (FANout-oriented test generation algorithm). It is shown that FAN algorithm is faster and more efficient than the PODEM algorithm reported by Goel. We also present an automatic test generation system composed of the FAN algorithm and the concurrent fault simulation. Experimental results on large combinational circuits of up to 3000 gates demonstrate that the system performs test generation very fast and effectively.
Full-text available
A transitive-closure-based test generation algorithm is presented. A test is obtained by determining signal values that satisfy a Boolean equation derived from the neural network model of the circuit incorporating necessary conditions for fault activation and path sensitization. The algorithm is a sequence of two main steps that are repeatedly executed: transitive closure computation and decision-making. A key feature of the algorithm is that dependences derived from the transitive closure are used to reduce ternary relations to binary relations that in turn dynamically update the transitive closure. The signals are either determined from the transitive closure or are enumerated until the Boolean equation is satisfied. Experimental results on the ISCAS 1985 and the combinational parts of ISCAS 1989 benchmark circuits are presented to demonstrate efficient test generation and redundancy identification. Results on four state-of-the-art production VLSI circuits are also presented
Full-text available
SIS is an interactive tool for synthesis and optimization of sequential circuits. Given a state transition table, a signal transition graph, or a logic-level description of a sequential circuit, it produces an optimized net-list in the target technology while preserving the sequential input-output behavior. Many different programs and algorithms have been integrated into SIS, allowing the user to choose among a variety of techniques at each stage of the process. It is built on top of MISII [5] and includes all (combinational) optimization techniques therein as well as many enhancements. SIS serves as both a framework within which various algorithms can be tested and compared, and as a tool for automatic synthesis and optimization of sequential circuits. This paper provides an overview of SIS. The first part contains descriptions of the input specification, STG (state transition graph) manipulation, new logic optimization and verification algorithms, ASTG (asynchronous signal tr...
Full-text available
Field-Programmable Gate Arrays (FPGAs) and Single-Instruction MultipleData (SIMD) processing arrays share many architectural features. In both architectures, an array of simple, fine-grained logic elements is employed to provide high-speed, customizable, bit-wise computation. In this paper, we present a unified computational array model which encompasses both FPGAs and SIMD arrays. Within this framework, we examine the differences and similarities between these array structures and touch upon techniques and lessons which can be transfered between the architectures. The unified model also exposes promising prospects for hybrid array architectures. We introduce the Dynamically Programmable Gate Array (DPGA) which combines the best features from FPGAs and SIMD arrays into a single array architecture.
From the Publisher: Architecture and CAD for Deep-Submicron FPGAs addresses several key issues in the design of high-performance FPGA architectures and CAD tools, with particular emphasis on issues that are important for FPGAs implemented in deep-submicron processes. Three factors combine to determine the performance of an FPGA: the quality of the CAD tools used to map circuits into the FPGA, the quality of the FPGA architecture, and the electrical (i.e. transistor-level) design of the FPGA. Architecture and CAD for Deep-Submicron FPGAs examines all three of these issues in concert.
Conference Paper
This paper presents experimental measurements of the differences between a 90nm CMOS FPGA and 90nm CMOS Standard Cell ASICs in terms of logic density, circuit speed and power consumption. We are motivated to make these measurements to enable system designers to make better informed hoices between these two media and to give insight to FPGA makers on the deficiencies to attack and thereby improve FPGAs. In the paper, we describe the methodology by which the measurements were obtained and we show that, for circuits containing only combinational logic and flip-flops, the ratio of silicon area required to implement them in FPGAs and ASICs is on average 40. Modern FPGAs also contain "hard" blocks such as multiplier/accumulators and block memories and we find that these blocks reduce this average area gap significantly to as little as 21. The ratio of critical path delay, from FPGA to ASIC, is roughly 3 to 4, with less influence from block memory and hard multipliers. The dynamic power onsumption ratio is approximately 12 times and, with hard blocks, this gap generally becomes smaller.
Conference Paper
The authors present a novel test pattern generation algorithm which uses the concept of necessary assignments to reduce or eliminate backtracking in automatic test pattern generation. Necessary assignments are those which must be made in order to find a test pattern; without them the search is guaranteed to fail. The algorithm is based on the mathematical concept of images and inverse images of set functions. In order to take advantage of formal concepts developed for Boolean algebras, the algorithm uses a 16-valued algebra. It has been used to generate test patterns for all faults in a variety of benchmark circuits. Experimental results indicate that the algorithm is particularly efficient at redundancy identification, which is often a problem for conventional test pattern generation algorithms. The benefits of a 16-valued system are illustrated through examples of faults which are not properly handled by conventional 5- or 9-valued systems
Conference Paper
In this papel; we investigate the speed and area-eficiency of FPGAs employing "logic clusters" containing multiple LUTs and registers as their logic block. We introduce a new, timing-driven tool (T-VPack) to "pack" LUTs and registers into these logic clusters, and we show that this algorithm is superior to an existing packing algorithm. Then, using a realistic routing architecture and sophisticated delay and area models, we empirically evaluate FPGAs composed of clusters ranging in size from one to twenty LUTs, and show that clusters of size seven through ten provide the best area-delay trade-o@ Compared to circuits implemented in an FPGA composed of size one clusters, circuits implemented in an FPGA with size seven clusters have 30% less delay (a 43% increase in speed) and require 8% less area, and circuits implemented in an FPGA with size ten clusters have 34% less delay (a 52% increase in speed), and require no additional area.
Conference Paper
This paper describes the architecture of a time-multiplexed FPGA. Eight configurations of the FPGA are stored in on-chip memory. This inactive on-chip memory is distributed around the chip, and accessible so that the entire configuration of the FPGA can be changed in a single cycle of the memory. The entire configuration of the FPGA can be loaded from this on-chip memory in 30 ns. Inactive memory is accessible as block RAM for applications. The FPGA is based on the Xilinx XC4000E FPGA, and includes extensions for dealing with state saving and forwarding and for increased routing demand due to time-multiplexing the hardware
Conference Paper
While modern FPGAs often contain clusters of 4-input lookup tables and flip flops, little is known about good choices for two key architectural parameters: the number of these basic logic elements (BLEs) in each cluster, and the total number of distinct inputs that the programmable routing can provide to each cluster. In this paper we explore the effect of these parameters on FPGA area-efficiency. We show that a cluster containing N BLEs needs only 2N+2 distinct inputs (vs. the 4N maximum) to achieve complete logic utilization. Secondly, we find that a cluster size of 4 is most area-efficient, and leads to an FPGA that is 5-10% more area-efficient than an FPGA based on a single BLE logic block
Conference Paper
This paper presents a new static logic implication algorithm. An improved implication procedure that fully takes advantage of the special context of static implication, the iterative method, and set algebra is described. The algorithm discovers at low cost many indirect implications which are not discovered by dynamic learning without tremendous time cost. The experimental results show that a very large number of indirect implications are found by our algorithm. The static implication procedure has many useful applications, one of which is static redundancy identification. Use of the static implications obtained from the algorithm in static redundancy identification for ISCAS85 combinational circuits resulted in a larger number of redundant faults identified than in previous methods
Conference Paper
This paper presents an architecture for a FPGA oriented towards logic emulation, to achieve maximum usable logic density per unit silicon area, and fast mapping. Logic circuits are translated into a program that is executed sequentially by a network of processor elements. Overall, a sevenfold increase in raw logic blocks, and a 25-fold increase in usable logic blocks compared to a FPGA-based logic emulator is expected for a given silicon area
In this paper, we revisit the field-programmable gate-array (FPGA) architectural issue of the effect of logic block functionality on FPGA performance and density. In particular, in the context of lookup table, cluster-based island-style FPGAs (Betz et al. 1997) we look at the effect of lookup table (LUT) size and cluster size (number of LUTs per cluster) on the speed and logic density of an FPGA. We use a fully timing-driven experimental flow (Betz et al. 1997), (Marquardt, 1999) in which a set of benchmark circuits are synthesized into different cluster-based (Betz and Rose, 1997, 1998) and (Marquardt, 1999) logic block architectures, which contain groups of LUTs and flip-flops. Across all architectures with LUT sizes in the range of 2 to 7 inputs, and cluster size from 1 to 10 LUTs, we have experimentally determined the relationship between the number of inputs required for a cluster as a function of the LUT size (K) and cluster size (N). Second, contrary to previous results, we have shown that clustering small LUTs (sizes 2 and 3) produces better area results than what was presented in the past. However, our results also show that the performance of FPGAs with these small LUT sizes is significantly worse (by almost a factor of 2) than larger LUTs. Hence, as measured by area-delay product, or by performance, these would be a bad choice. Also, we have discovered that LUT sizes of 5 and 6 produce much better area results than were previously believed. Finally, our results show that a LUT size of 4 to 6 and cluster size of between 3-10 provides the best area-delay product for an FPGA.
This paper presents experimental measurements of the differences between a 90-nm CMOS field programmable gate array (FPGA) and 90-nm CMOS standard-cell application-specific integrated circuits (ASICs) in terms of logic density, circuit speed, and power consumption for core logic. We are motivated to make these measurements to enable system designers to make better informed choices between these two media and to give insight to FPGA makers on the deficiencies to attack and, thereby, improve FPGAs. We describe the methodology by which the measurements were obtained and show that, for circuits containing only look-up table-based logic and flip-flops, the ratio of silicon area required to implement them in FPGAs and ASICs is on average 35. Modern FPGAs also contain "hard" blocks such as multiplier/accumulators and block memories. We find that these blocks reduce this average area gap significantly to as little as 18 for our benchmarks, and we estimate that extensive use of these hard blocks could potentially lower the gap to below five. The ratio of critical-path delay, from FPGA to ASIC, is roughly three to four with less influence from block memory and hard multipliers. The dynamic power consumption ratio is approximately 14 times and, with hard blocks, this gap generally becomes smaller
In this paper, an iterative technology-mapping tool called IMap is presented. It supports depth-oriented (area is a secondary objective), area-oriented (depth is a secondary objective), and duplication-free mapping modes. The edge-delay model (as opposed to the more commonly used unit-delay model) is used throughout. Two new heuristics are used to obtain area reductions over previously published methods. The first heuristic predicts the effects of various mapping decisions on the area of the final solution, and the second heuristic bounds the depth of the mapping solution at each node. In depth-oriented mode, when targeting five lookup tables (LUTs), IMap obtains depth optimal solutions that are 44.4%, 19.4%, and 5% smaller than those produced by FlowMap, CutMap, and DAOMap, respectively. Targeting the same LUT size in area-oriented mode, IMap obtains solutions that are 17.5% and 9.4% smaller than those produced by duplication-free mapping and ZMap, respectively. IMap is also shown to be highly efficient. Runtime improvements of between 2.3times and 82times are obtained over existing algorithms when targeting five LUTs. Area and runtime results comparing IMap to the other mappers when targeting four and six LUTs are also presented
An automatic test pattern generation system, SOCRATES, is presented. SOCRATES includes several novel concepts and techniques that significantly improve and accelerate the automatic test pattern generation process for combinational and scan-based circuits. Based on the FAN algorithm, improved implication, sensitization, and multiple backtrace procedures are described. The application of these techniques leads to a considerable reduction of the number of backtrackings and an earlier recognition of conflicts and redundancies. Several experiments using a set of combinational benchmark circuits demonstrate the efficiency of SOCRATES and its cost-effectiveness, even in a workstation environment
The relationship between the routability of a field-programmable gate array (FPGA) and the flexibility of its interconnection structures is examined. The flexibility of an FPGA is determined by the number and distribution of switches used in the interconnection. While good routability can be obtained with a high flexibility, a large number of switches will result in poor performance and logical density because each switch has significant delay and area. The minimum number of switches required to achieve good routability is determined by implementing several industrial circuits in a variety of interconnection architectures. These experiments indicate that high flexibility is essential for the connection block that joins the logic blocks to the routing channel, but a relative low flexibility is sufficient for switch blocks at the junction of horizontal and vertical channels. Furthermore, it is necessary to use only a few more routing tracks than the absolute minimum possible with structures of surprisingly low flexibility
The relationship between the functionality of a field-programmable gate array (FPGA) logic block and the area required to implement digital circuits using that logic block is examined. The investigation is done experimentally by implementing a set of industrial circuits as FPGAs using CAD (computer-aided design) tools for technology mapping, placement, and routing. A range of programming technologies (the method of FPGA customization) is explored using a simple model of the interconnection and logic block area. The experiments are based on logic blocks that use lookup tables for implementing combinational logic. Results indicate that the best number of inputs to use (a measure of the block's functionality) is between three and four, and that a D flip-flop should be included in the logic block. The results are largely independent of the programming technology. More generally, it was observed that the area efficiency of a logic block depends not only on its functionality but also on the average number of pins connected per logic block
The routing architecture of an FPGA consists of the length of the wires, the type of switch used to connect wires (buffered, unbuffered, fast or slow) and the topology of the interconnection of the switches and wires. FPGA Routing architecture has a major influence on the logic density and speed of FPGA devices. Previous work [1] based on a 0.35um CMOS process has suggested that an architecture consisting of length 4 wires (where the length of a wire is measured in terms of the number of logic blocks it passes before being switched) and half of the programmable switches are active buffers, and half are pass transistors. In that work, however, the topology of the routing architecture prevented buffered tracks from connecting to pass-transistor tracks. This restriction prevents the creation of interconnection trees for high fanout nets that have a mixture of buffers and pass transistors. Electrical simulations suggest that connections closer to the leaves on interconnection trees are faster using pass transistors, but it is essential to buffer closer to the source. This latter effect is well known in regular ASIC routing [2].
In this paper we revisit the FPGA architectural issue of the effect of logic block functionality on FPGA performance and density. In particular, in the context of lookup table, cluster-based island-style FPGAs [4] we look at the effect of lookup table (LUT) size and cluster size (number of LUTs per cluster) on the speedand logic density of an FPGA. Although this question was addressed some time ago in [17] [18] [12] [13] [10] and [22], several reasons compelled us to revisit the issue. First, prior work focused on non-clustered logic blocks, which are known to have a significant impact on the area and delay [16]. Second, most prior studies tended to look at area or delay, but not both as we will here. Third, prior results were based on IC process generations that are several factors larger than current process generations, and so do not take deep-submicron electrical effects into account. In the present work, we perform detailed spice-level simulations of circuits and perform appropri...
In this work we investigate the routing architecture of FPGAs, focusing primarily on determining the best distribution of routing segment lengths and the best mix of pass transistor and tri-state buffer routing switches. While most commercial FPGAs contain many length 1 wires (wires that span only one logic block) we find that wires this short lead to FPGAs that are inferior in terms of both delay and routing area. Our results show instead that it is best for FPGA routing segments to have lengths of 4 to 8 logic blocks. We also show that 50% to 80% of the routing switches in an FPGA should be pass transistors, with the remainder being tri-state buffers. Architectures that employ the best segmentation distributions and the best mixes of pass transistor and tri-state buffer switches found in this paper are not only 11% to 18% faster than a routing architecture very similar to that of the Xilinx XC4000X but also considerably simpler. These results are obtained using an architecture invest...
Altera Device Handbook
  • Altera
Xilinx Device Handbook
  • Xilinx