IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Published by Institute of Electrical and Electronics Engineers
Print ISSN: 0278-0070
Publications
This paper reports a unified triode/saturation model with an improved continuity in the output conductance suitable for CAD of VLSI circuits using deep sub-0.1 μm NMOS devices. As verified by the experimental data, the model shows an accurate prediction of the output conductance characteristics
 
Simulation and experimental results are presented for an active-noise-suppression technique to reduce substrate crosstalk in mixed-signal IC technology. The method utilizes a 3-D distributed resistive-capacitive substrate model, along with a BiCMOS wideband differential noise suppression amplifier (NSA) designed in IBM's 0.18-mum 7WL BiCMOS technology. Simulation results for a GR-defined ldquoquietrdquo region predict a noise suppression factor of -6 dB over a frequency range of 10 MHz-2 GHz at the center of the region with a peak suppression of -14 dB at the point of the NSA connection to the guard ring (GR). BF-Moat and P<sup>+</sup> region/deep trench/n-well GR isolation structures were also integrated into the 3-D substrate model, for investigation of their isolation ability. The simulated substrate-noise suppression and isolation results were verified with an experimental test site, designed and fabricated in 7WL technology. Measurements of both noise suppression and isolation factors were compared to the simulation results and to predictions derived from an analytical model.
 
A behavioral model of a 1.8-V, 6-bit flash analog-to-digital converter has been developed based on device parameters using the g<sub>m</sub>/I<sub>d</sub> methodology. This approach eliminates the need for recharacterization of blocks when device sizes are changed. Furthermore, the performance can be predicted with input only from device and process simulators eliminating the need for a circuit simulator and associated model parameters. Signal to noise plus distortion ratio and differential and integral nonlinearity are predicted and verified at lower resolution with a circuit simulator
 
We present a methodology for the watermarking of synchronous sequential circuits that makes it possible to identify the authorship of designs by imposing a digital watermark on the state transition graph (STG) of the circuit. The methodology is applicable to sequential designs that are made available as firm intellectual property, the designation commonly used to characterize designs specified as structural hardware description languages or circuit netlists. The watermarking is obtained by manipulating the STG of the design in such a way as to make it exhibit a chosen property that is extremely rare in nonwatermarked circuits while, at the same time, not changing the functionality of the circuit. This manipulation is performed without ever actually computing this graph in either implicit or explicit form. Instead, the digital watermark is obtained by direct manipulation of the circuit description. We present evidence that no known algorithms for circuit manipulation can be used to efficiently remove or change the watermark and that the process is immune to a variety of other attacks. We present both theoretical and experimental results that show that the watermarking can be created and verified efficiently. We also test possible attack strategies and verify that they are inapplicable to realistic designs of medium to large complexity
 
In this paper, we introduce PVL, an algorithm for computing the Pade approximation of Laplace-domain transfer functions of large linear networks via a Lanczos process. The PVL algorithm has significantly superior numerical stability, while retaining the same efficiency as algorithms that compute the Pade approximation directly through moment matching, such as AWE and its derivatives. As a consequence, it produces more accurate and higher-order approximations, and it renders unnecessary many of the heuristics that AWE and its derivatives had to employ. The algorithm also computes an error bound that permits to identify the true poles and zeros of the original network. We present results of numerical experiments with the PVL algorithm for several large examples
 
In this paper, we propose a new method for the synthesis of 1-D 90/150 linear-hybrid-group cellular automata for CA-polynomials. We obtain large-cell CA very rapidly using our algorithm. This algorithm is efficient and suitable for all practical applications.
 
An interconnect diagnosis scheme based on the oscillation ring (OR) test methodology for systems-on-chip (SOC) design with heterogeneous cores is proposed. In addition to traditional stuck-at and open faults, the OR test can also detect and diagnose important interconnect faults such as delay faults and crosstalk glitches. The large number of test rings in the SOC design, however, significantly complicates the interconnect diagnosis problem. In this paper, the diagnosability of an interconnect structure is first analyzed then a fast diagnosability checking algorithm and an efficient diagnosis ring generation algorithm are proposed. It is shown in this paper that the generation algorithm achieves the maximum diagnosability for any interconnect. Two optimization techniques are also proposed, an adaptive and a concurrent diagnosis method, to improve the efficiency and effectiveness of interconnect diagnosis. Experiments on the MCNC benchmark circuits show the effectiveness of the proposed diagnosis algorithms. In all experiments, the method achieves 100% fault detection coverage and the optimal interconnect diagnosis resolution
 
Diagnosis ring generation procedure.
An adaptive diagnosis tree.
We propose an interconnect diagnosis scheme based on oscillation ring test methodology for SOC design with heterogeneous cores. The target fault models are delay faults and crosstalk glitches. We analyze the diagnosability of an interconnect structure and propose a fast diagnosability checking algorithm and an efficient diagnosis ring generation algorithm which achieves the optimal diagnosability. Two optimization techniques improve the efficiency and effectiveness of interconnect diagnosis. In all experiments, our method achieves 100% fault coverage and the optimal diagnosis resolution.
 
In this paper, a time and memory-efficient diagnostic fault simulator for sequential circuits is first presented. A distributed diagnostic fault simulator is then presented based on the sequential algorithm to improve the speed of the diagnostic process. In the sequential diagnostic fault simulator, the number of fault-pair output response comparisons has been minimized by using an indistinguishability fault list that stores the faults that are indistinguishable from each fault. Due to the symmetrical relationship of the fault-pair distinguishability, fault list sizes are reduced. Therefore, the different diagnostic measures of a given test set can be generated very quickly using a small amount of memory. To further speed up the process of finding the indistinguishable fault list for each fault, a distributed approach is proposed and developed. The major idea for this approach is that each processor constructs the indistinguishable fault lists for a certain percentage of faults only. Experimental results show that the sequential diagnostic fault simulator runs faster and uses less memory than a previously developed one and that the distributed algorithm even achieves superlinear speedup for a very large sequential benchmark circuit, s35932. To the authors' knowledge, no distributed diagnostic fault simulation system for sequential circuits has been proposed before
 
We propose a procedure for generating test sequences for diagnosis of synchronous sequential circuits based on stuck at faults. In this procedure, we avoid the conventional fault-oriented test generation process by observing that a sequence to distinguish two faults can be obtained from a sequence T that detects both of the faults (such as a test sequence for fault detection) by changing T so as to “undetect” one of the faults, or change the time units or outputs where the fault is detected. To achieve this goal, the proposed procedure eliminates parts of T so as to render some of the faults undetected, or change their detection times or outputs. In the case where faults become undetected by the modified sequence, the detected faults are distinguished from the faults left undetected by the modified sequence based on pass/fall information. A pass/fall dictionary based on modified test sequences is proposed for this case. Alternatively, a standard dictionary can be used, and the proposed procedure can be used to change the time units or outputs where faults are detected in order to distinguish them. We present experimental results to demonstrate the levels of resolution that can be obtained by the proposed procedure with the proposed pass/fail dictionary, and the number of sequences required for this purpose
 
This paper presents a robust global router called NTHU-Route 2.0 that improves the solution quality and runtime of NTHU-Route by the following enhancements: 1) a new history based cost function; 2) new ordering methods for congested region identification and rip-up and reroute; and 3) two implementation techniques. We report convincing experimental results to show the effectiveness of each individual enhancement. With all these enhancements together, NTHU-Route 2.0 solves all ISPD98 benchmarks with very good quality. Moreover, NTHU-Route 2.0 routes 7 of 8 ISPD07 benchmarks and 12 of 16 ISPD08 benchmarks without any overflow. Compared with other state-of-the-art global routers, NTHU-Route 2.0 is able to produce better solution quality and/or run more efficiently.
 
An important phase in the single row routing approach for multilayer printed circuit board routing is the via assignment phase. The via assignment problem, whose objective is to minimize the number of via columns used, is known to be NP-hard even when each net includes at most three vias. In this paper, we develop efficient approximation algorithms for solving this problem. When the size of each net is bounded by three, we present an algorithm which guarantees that the solution generated uses no more than 2OPTC — 1 via columns, and no more than [(4/3)OPTv \ vias, where OPTcand OPTvare the number of via columns and vias in an optimal solution. We then extend our result to the case that the nets have arbitrary sizes. An efficient algorithm is presented which guarantees the solution generated uses no more than [2.5OPTc] via columns and no more than [1.5OPTV] vias in the worst case.
 
The four papers in this special section were originally presented at the IEEE Symposium on Application Specific Processors 2008.
 
The four papers in this special section are extended versions of papers presented at the 3rd ACM/IEEE Symposium on Networks-on-Chip (NOCS) in San Diego, CA, in 2009.
 
This paper describes the first study of the complete sequence from process simulation to circuit performance and the corresponding sensitivities for 0.25-μm technology. This is made possible by a combination of physically based process models and a systematic calibration involving SIMS, one-dimensional (1-D), and two-dimensional (2-D) device characteristics. Simulated nFET and pFET characteristics match hardware (HW) within 5-10% for both long-channel and nominal length devices. Simulated ring-oscillator performance is in good agreement with HW data. Sensitivities of device characteristics and the inverter gate delay to process variations (within 10%) are quantified. These investigations establish the correlation between process variations and circuit performance
 
A two-dimensional device simulator SNU-2D based on the hydrodynamic model is developed for the simulation and analysis of submicron devices. The simulator has the capacity for both self-consistent steady-state and transient-state simulation. To obtain better convergence and numerical stability, we adopt an improved discretization scheme for the carrier energy flux equation and a new strategy for the transient simulation. In steady-state simulation the new discretization scheme shows a considerable improvement in convergence rate and numerical accuracy compared with the existing schemes. A transient simulation study is carried out on a deep submicron n-MOSFET used in the sense amplifier of SRAM cells to investigate the gate-switching characteristic. It is found that the behavior of carrier temperature is quasi-static during the switching time even for very fast switching speed, while the behavior of impact ionization under transient mode deviates from that under dc mode as the switching speed increases
 
The aliasing probabilities of multiple-input signature registers (MISR) with m inputs for a 2<sup>m</sup>-ary symmetric channel, where each of the (2<sup>m</sup>-1) possible errors is equally likely, are analyzed. For this error model, the aliasing probabilities of MISRs are analyzed using the weight distributions of maximum-distance-separable (MDS) codes. The results show that the aliasing probabilities over the 2<sup>m</sup>-ary symmetric channel do not depend on the polynomials that characterize the MISRs. That is, for the 2<sup>m</sup>-ary symmetric channel, the aliasing probability of an MISR based on a primitive polynomial is exactly the same as one based on a nonprimitive one. In addition, it is observed that the aliasing probabilities, P <sub>al</sub> ( n ), as a function of test length n , are monotonous for error probabilities p =0.2, 0.4, and 0.8. The aliasing probabilities of multiple MISRs based on Reed-Solomon codes are analyzed again for the 2<sup>m </sup>-ary symmetric channel, using the weight distributions of Reed-Solomon codes, which are MDS codes
 
The application of 2<sup>N</sup> trees to device model approximation is described. The domain of the device model function is partitioned using a 2<sup>N</sup> tree, with smaller partitions where the function is more nonlinear. The function value associated with each corner of each partition is precomputed, and the function is evaluated by a given point by interpolation over the smallest partition that includes that point. This technique has the advantage that highly nonlinear functions can be modeled with modest space and time requirements. Exponential functions, such as the subthreshold behavior of FETs, can be accurately modeled. Accuracy levels of 1% are possible down to currents of 10<sup>-11</sup> A. Table generation time is small; it is only a few minutes for a MOSFET model including subthreshold effects. This algorithm is especially suited for application to a hardware-accelerated device model evaluator. The design of a prototype that is capable of performing a device model evaluation of a SPICE level-2 model including subthreshold effects in 1 μs is described. Less detailed models, such as timing simulator models, can be evaluated in as little as 0.2 μs
 
An engineering model of the short-channel NMOS transistor which is applicable to both room-temperature and cryogenic device operation is presented. The model incorporates the nonuniversal dependence of the effective channel mobility on the effective vertical field, which is ignored in room-temperature device models. Described also is a novel method to account for the bulk charge effect in the presence of drift velocity saturation, channel length modulation, charge sharing by the drain and source, and temperature dependence of the critical field. The proposed model is verified by comparison with experimental device characteristics obtained over a wide range of terminal voltages, temperatures, and channel lengths
 
BELLMAC-32A is a single-chip fully 32-bit high-end microprocessor designed in 2.5-?m twin-tub CMOS technology. This paper describes the gate matrix layout of random control logic in BELLMAC-32A with top-down hierarchical design methodology. The gate matrix layout provided (1) parallel team layout efforts, (2) adaptability to evolving logic design with short turnaround time, (3) high packing density competitive with hand layout, (4) compatibility with computer-aided layout and verification tools, (5) capability to fine-tune circuits, and (6) technology updatability. It took 6.5 engineer-years to complete the layout of random control logic with 7000 transistors although the logic design was continuously evolving during the layout period. The average packing density of gate matrix layout was 1500 ?m2 per transistor in random logic and 840 ?m2 per transistor in data path. BELLMAC-32A had more-than-three times performance improvement over its 3.5 ?m technology prototype chip BELLMAC-32, in which random control logic was implemented with polycells.
 
The paper presents a novel strategy aimed at modeling instruction energy consumption of 32-bit microprocessors. Different from former approaches, the proposed instruction-level power model is founded on a functional decomposition of the activities accomplished by a generic microprocessor. The proposed model has significant generalization capabilities. It allows estimation of the power figures of the entire instruction-set starting from the analysis of a subset, as well as to power characterize new processors by using the model obtained by considering other microprocessors. The model is formally presented and justified and its actual application over five commercial microprocessors is included. This static characterization is the basic information for system-level power modeling of hardware/software architectures.
 
Algorithms for general surface advancement, three-dimensional visibility, and convolution over a surface have been developed and coupled with physical models for pattern transfer. The resulting program, SAMPLE-3D, allows practical simulation of plasma etching and deposition processes on engineering workstations. The physical models are 3-D extensions of 2-D string and segment based models. The models include secondary effects, such as material density variations and damage enhanced etching. A general facet motion algorithm supports simple, isotropic, cosine-directional, and general surface orientation dependent processes. A 3-D grid of rectangular prismatic cells, which is updated by the advancing surface, contains an alternate topography representation for fast shadow and visibility calculation. The program is organized as a collection of modular functions for continued model and algorithm development. Guidelines for estimating CPU and memory requirements for various models and simulation cases are based on an analysis of the algorithms and data structures. Simple processes, such as lithography development, require 1-5 min of CPU time. Simulations involving integration over flux distributions, such as plasma etching and sputter deposition, require from 5-30 min for typical cases. Reflection or surface migration calculations require from 30-60 min. Physical memory of 4-32 megabytes is sufficient for many practical simulations
 
A three-dimensional (3-D) process simulator, OPUS/3D, has been developed. It has access to two-dimensional (2-D) process simulation results. Impurity profiles and structural data simulated rigidly in 2-D space are expanded to 3-D space at an arbitrary stage of the simulation processes. 3-D simulation follows until the end of the process. The access to 2-D simulation results enables OPUS/3D to handle curved boundaries as seen in field oxides. OPUS/3D has three different methods for expanding 2-D results. Errors due to these expansions are discussed. OPUS/3D was applied to 3-D simulations of MOS devices at the corner edge of the active region and in the channel region near the source/drain. Computation time was drastically reduced by replacing part of the 3-D simulations by 2-D simulations. Some 3-D effects are confirmed by OPUS/3D
 
We present PMC-3D, a parallel three-dimensional (3-D) Monte Carlo device simulator for multiprocessors. The parallel algorithm is an extension of the standard Monte Carlo device simulation model in 3-D, in which the particle dynamics generated from the stochastic Monte Carlo method are solved simultaneously with Poisson's equation on a 3-D mesh using finite differences. Due to the large computational requirements of 3-D device simulation, it is necessary to parallelize both the Poisson solver and the Monte Carlo simulation phase of the device simulator. The parallel algorithms were implemented on a 1024-node distributed memory nCUBE multicomputer and a 4-mode shared memory Ardent multiprocessor. We validate the accuracy of our implementations by generating the static characteristics of a MESFET and present test results on the fixed and scaled speedups obtained on the two types of parallel computers. Improvements in performance are observed utilizing dynamic load balancing for the distributed memory case
 
The 3D packing problem consists of arranging nonoverlapping rectangular boxes (blocks) of given sizes in a rectangular box of minimum volume. As a representation of 3D packings, this paper proposes a novel encoding method called Double Tree and Sequence (DTS). The following are features of DTS: 1) It can represent any minimal packing. 2) It can be decoded into the corresponding 3D packing in O ( n <sup>2</sup>) time, where n is the number of rectangular boxes. 3) The size of the solution space (the number of codes) of DTS is significantly smaller than any conventional representation that can represent any packing. Experimental comparisons with conventional representations indicate the superiority of the proposed representation DTS.
 
This paper presents a novel three-dimensional (3D) thermal simulation tool for semiconductor integrated devices. The simulator is used to automatically generate an accurate 3D physical model of the device to be simulated from layout information. The simulator produces an appropriate mesh of the device based on a rectangular block structure. The mesh is automatically created such that a fine mesh is produced around heat generation regions, but a moderate number of blocks are used for the entire device. This paper first confirms that the simulator produces an accurate solution to the nonlinear differential equation describing the heat flow. Then model generation from three example technologies (silicon trench, GaAs mesa structures, silicon on insulator) is presented. The potential of the simulator to quickly and easily explore the effect of layout and process variations is illustrated, with the simulation of a two-transistor GaAs power cell as a large example. The program incorporates a transient solver based on a transmission line matrix (TLM) implementation using a physical extraction of a resistance and capacitance network. The formulation allows for temperature dependent material parameters and a nonuniform time stepping. An example of a full transient solution of heat flow in a realistic Si trench device is presented
 
The problem of detection and identification of a faulty processing element in a systolic array is addressed. A method for designing processing elements with concurrent error detection is presented. The | gAN |<sub>M</sub> code is shown to be an effective code for encoding the operands in a systolic array. It is shown that the | g 3 N |<sub>M</sub> code is equivalent to a residue code with the check and information bits interchanged, for an odd number of information bits. This allows arithmetic to be performed separately on the information and check bits while the output can be checked by an AN checker. An architecture and rules for designing a self-checking processing element (PE) for systolic arrays are presented. Both redundancy and extra delay of the self-checking PE are shown to be low
 
Programmable logic devices such as field-programmable gate arrays (FPGAs) are useful for a wide range of applications. However, FPGAs are not commonly used in battery-powered applications because they consume more power than application-specified integrated circuits and lack power management features. In this paper, we describe the design and implementation of Pika, a low-power FPGA core targeting battery-powered applications. Our design is based on a commercial low-cost FPGA and achieves substantial power savings through a series of power optimizations. The resulting architecture is compatible with existing commercial design tools. The implementation is done in a 90-nm triple-oxide CMOS process. Compared to the baseline design, Pika consumes 46% less active power and 99% less standby power. Furthermore, it retains circuit and configuration state during standby mode and wakes up from standby mode in approximately 100 ns
 
Recounts the life and career of A. Richard Newton and provides an excerpt of his keynote address at the 1995 Design Automation Conference and an unabridged presentation of his address to the Berkeley EECS Annual Research Symposium on February 23, 2006.
 
A new (AlGa)As/GaAs MODFET integrated circuit simulator is described. Our simulator is a customized version of SPICE incorporating the extended charge control model for MODFET's and the velocity saturation model for ungated FET's used as the load devices. Comparison of our simulator results with the measured data show good agreement for both dc and transient responses. We also propose a set of analytically derived design guidelines for MODFET inverter stages. Design parameters such as the optimum ratio of driver to load saturation currents, noise margins and switching delays can be readily related to the device process parameters. Our simulation results indicate that the inverter speed increases with increasing driver threshold voltages but there is an optimum threshold voltage, of approximately 0.4 V for our devices, which provide the highest noise margin.
 
Alternating-aperture phase shift masking (AAPSM), a form of strong resolution enhancement technology, will be used to image critical features on the polysilicon layer at smaller technology nodes. This technology imposes additional constraints on the layouts beyond traditional design rules. Of particular note is the requirement that all critical features be flanked by opposite-phase shifters while the shifters obey minimum width and spacing requirements. A layout is called phase assignable if it satisfies this requirement. Phase conflicts have to be removed to enable the use of AAPSM for layouts that are not phase assignable. Previous work has sought to detect a suitable set of phase conflicts to be removed as well as correct them. This paper has two key contributions: 1) a new computationally efficient approach to detect a minimal set of phase conflicts, which when corrected will produce a phase-assignable layout, and 2) a novel layout modification scheme for correcting these phase conflicts with small layout area increase. Unlike previous formulations of this problem, the proposed solution for the conflict detection problem does not frame it as a graph bipartization problem. Instead, a simpler and more computationally efficient reduction is proposed. This simplification greatly improves the runtime while maintaining the same improvements in the quality of results obtained in Chiang (Proc. DATE, 2005, p. 908). An average runtime speedup of 5.9times is achieved using the new flow. A new layout modification scheme suited for correcting phase conflicts in large standard-cell blocks is also proposed. The experiments show that the percentage area increase for making standard-cell blocks phase assignable ranges from 1.7% to 9.1%
 
In this paper, the transient response of arbitrarily terminated nonuniform transmission lines with frequency-dependent parameters is analyzed by the introduction of ABCD matrices and the waveform relaxation (WR) method. A differential equation describing the ABCD matrices of nonuniform transmission lines is derived and then solved efficiently with the WR method. A convergence theorem is proven, according to which the nonuniform transmission line is segmented into a number of cascaded subnetworks to increase the convergence speed. An example of nonuniform transmission system is analyzed. The results are comparable to that of the convolution-characteristics method
 
With continuous advances in radio-frequency (RF) mixed-signal very large scale integration (VLSI) technology, the creation of eddy currents in lossy multilayer substrates has made the already complicated interconnect analysis and modeling issue more challenging. To account for substrate losses, traditional electromagnetic methods are often computationally prohibitive for today's VLSI geometries. In this paper, an accurate and efficient interconnect modeling approach-the eddy-current-aware partial equivalent element circuit (EPEEC)-is proposed. Based on complex image theory, it extends the traditional partial equivalent element circuit (PEEC) model to simultaneously take multilayer substrate eddy-current losses and frequency-dependent effects into consideration. To accommodate even larger scale on-chip interconnect networks, EPEEC develops a new simulation program with integrated circuit emphasis (SPICE)-compatible reluctance extraction algorithm by applying sparsification in the inverse inductance domain with an extended window algorithm. Compared with several industry standard inductance and full-wave solvers, such as FastHenry and Sonnet, EPEEC demonstrates within 1.5% accuracy while providing over 100× speedup.
 
The design of checkers aimed at the concurrent test of analog and mixed-signal circuits is considered in this paper. These checkers can on-line test duplicated and fully differential analog circuits. The test approach is based on exploiting the inherent redundancy of these circuits which results in the use of a code for the analog signals. The analog code is monitored by the checkers. An error signal which complies with existing digital self-checking parts is generated in the case that a code fails out of the valid code space. For the verification of the analog codes, absolute tolerance margins and tolerance margins which are made relative to signal amplitude are considered. A test pattern generator for off-line testing of the checkers is proposed
 
Abstraction-refinement loop in this paper.
As a first step, most model checkers used in the hardware industry convert a high-level register-transfer-level (RTL) design into a netlist. However, algorithms that operate at the netlist level are unable to exploit the structure of the higher abstraction levels and, thus, are less scalable. The RTL of a hardware description language such as Verilog is similar to a software program with special features for hardware design such as bit-vector arithmetic and concurrency. This paper uses predicate abstraction, a software verification technique, for verifying RTL Verilog. There are two challenges when applying predicate abstraction to circuits: 1) the computation of the abstract model in presence of a large number of predicates and 2) the discovery of suitable word-level predicates for abstraction refinement. We address the first problem using a technique called predicate clustering. We address the second problem by computing the weakest preconditions of Verilog statements in order to obtain new word-level predicates during abstraction refinement. We compare the performance of our technique with localization reduction, a netlist-level abstraction technique, and report improvements on a set of benchmarks.
 
Simple example of SystemC and BS-ESL.  
Sample circuit. (a) Circuit. (b) Hypergraph.
SystemC's growing community for system-level design exploration is a result of SystemC's capability of modeling at register transfer level (RTL) and above RTL abstraction levels. However, a synthesis path from SystemC at abstraction layers above RTL is still in its infancy. A recent extension of SystemC, which is called Bluespec-SystemC electronic system level (BS-ESL), counters this difficulty with its model of computation employing atomic rule-based specifications and synthesis to Verilog. In order to simulate a model consisting of one part designed in SystemC and another using BS-ESL, we require an interoperability semantics and implementation of such a semantics. To illustrate the problem, we formalize the simulation semantics of BS-ESL and discrete-event simulation of RTL SystemC, and provide a solution based on this formalization.
 
The effect of the generational refinement, with the refinement minimization. 
Illustration of the winning position. 
We propose an abstraction refinement method for invariant checking, which is based on the simultaneous analysis of all abstract counter examples of shortest length in the current abstraction. The algorithm is focused on an improved Ariadne's Bundle<sup>1</sup> of SORs (Synchronous Onion Rings) of the abstract model; the transitions through these SORs contain all shortest ACEs (Abstract Counter Examples) and no other ACEs. The SORs are exploited in two distinct ways to provide global guidance to the abstraction refinement process: (1) Refinement variable selection is based on the entirety of transitions connecting the SORs, and (2) a SAT-based concretization test is formulated to test all ACEs in the SORs at once. We call this test multi-thread concretization. The scalability of our refinement algorithm is ensured in the sense that all the analysis and computation required in our refinement algorithm are conducted on the abstract model. The abstraction efficiency of a given abstraction refinement algorithm measures how much of the concrete model is required to make the decision. We include experimental comparisons of our new method with recently published techniques. The results show that our scalable method, based on global guidance from the entire bundle of shortest ACEs, outperforms these other methods in terms of both run time and abstraction efficiency.
 
We present a network interface (NI) for an on-chip network. Our NI decouples computation from communication by offering a shared-memory abstraction, which is independent of the network implementation. We use a transaction-based protocol to achieve backward compatibility with existing bus protocols such as AXI, OCP, and DTL. Our NI has a modular architecture, which allows flexible instantiation. It provides both guaranteed and best-effort services via connections. These are configured via NI ports using the network itself, instead of a separate control interconnect. An example instance of this NI with four ports has an area of 0.25 mm<sup>2</sup> after layout in 0.13-μm technology, and runs at 500 MHz.
 
We describe new techniques for model checking in the counterexample-guided abstraction-refinement framework. The abstraction phase "hides" the logic of various variables, hence considering them as inputs. This type of abstraction may lead to "spurious" counterexamples, i.e., traces that cannot be simulated on the original (concrete) machine. We check whether a counterexample is real or spurious with a satisfiability (SAT) checker. We then use a combination of 0-1 integer linear programming and machine learning techniques for refining the abstraction based on the counterexample. The process is repeated until either a real counterexample is found or the property is verified. We have implemented these techniques on top of the model checker NuSMV and the SAT solver Chaff. Experimental results prove the viability of these new techniques.
 
The TPN for the entire two stage STARI circuit.
The composition of TPNs for CLK, TX, RX, and Stage2 from the STARI example to form the environment for Stage1.
Environment for Stage1 after removing $x2.t¢ and $x2.t£ using Transformation 1.
The major barrier that prevents the application of formal verification to large designs is state explosion. This paper presents a new approach for verification of timed circuits using automatic abstraction. This approach partitions the design into modules, each with constrained complexity. Before verification is applied to each individual module, irrelevant information to the behavior of the selected module is abstracted away. This approach converts a verification problem with big exponential complexity to a set of subproblems, each with small exponential complexity. Experimental results are promising in that they indicate that our approach has the potential of completing much faster while using less memory than traditional flat analysis.
 
This paper proposes a novel approach to solve the allocation and scheduling problems for variable voltage/frequency multiprocessor systems-on-chip, which minimizes overall system energy dissipation. The optimality of derived system configurations is guaranteed, while the computation efficiency of the optimizer allows for solving problem instances that were traditionally considered beyond reach for exact solvers (optimality gap). Furthermore, this paper illustrates the development- and run-time software infrastructures that assist the user in developing applications and implementing optimizer solutions. The proposed approach guarantees a high level of power, performance, and constraint satisfaction predictability as from validation on the target platform, thus bridging the abstraction gap.
 
In this paper we propose a data structure for abstracting the delay information of a combinatorial circuit. The particular abstraction that we are interested in is one that preserves the delays between all pairs of inputs and outputs in the circuit. Such abstractions are useful when considering the delay of cascaded circuits in high-level synthesis and other such applications in synthesis. The proposed graphical data structure is called the concise delay network, and is of size proportional to (m+n) in best case, where m and n refer to the number of inputs and outputs of the circuit. In comparison, a delay matrix that stores the maximum delay between each input-output pair has size proportional to m×n. For circuits with hundreds of inputs and outputs, this storage and the associated computations become quite expensive, especially when they need to be done repeatedly during synthesis. We present heuristic algorithms for deriving these concise delay networks. Experimental results shows that, in practice, we can obtain concise delay network with the number of edges being a small multiple of (m+n)
 
In this paper, an algorithm is presented for the verification of the equivalence of two sequential circuit descriptions at the same or differing levels of abstraction, namely at the register-transfer (RT) level and the logic level. The descriptions represent general finite automata at the differing levels -- a finite automaton can be described in a ISP-like language and its equivalence to a logic level implementation can be verified using our algorithm. Two logic level automatons can be similarly verified for equivalence. Previous approaches to sequential circuit verification have been restricted to verifying relatively simple descriptions with small amounts of memory. Unlike these approaches, our technique is shown to be computationally efficient for much more complex circuits. The efficiency of our algorithm lies in the exploitation of don't care information derivable from the RTL or logic level description (e.g invalid input and output sequences) during the verification process. Using efficient cube enumeration procedures at the logic level we have been able to verify the equivalence of finite automata with a large number of states in small amounts of cpu-time.
 
Design debugging is one of the major remaining manual processes in the semiconductor design cycle. Despite recent advances in the area of automated design debugging, more effort is required to cope with the size and complexity of today's designs. This paper introduces an abstraction and refinement methodology to enable current debuggers to operate on designs that are orders of magnitude larger than otherwise possible. Two abstraction techniques are developed with the goals of improving debugger performance for different circuit structures: State abstraction is aimed at reducing the problem size for circuits consisting purely of primitive gates, while function abstraction focuses on designs that also contain modular and hierarchical information. In both methods, after an initial abstracted model is created, the problem can be solved by an existing automated debugger. If an error site is abstracted, refinement is necessary to reintroduce some of the abstracted components back into the design. This paper also presents the underlying theory to guarantee correctness and completeness of a debugging tool that operates using the proposed methodology. Empirical results demonstrate improvements in run time and memory capacity of two orders of magnitude over a state-of-the-art debugger on a wide range of benchmark and industrial designs.
 
This paper presents a compositional method with failure-preserving abstraction for scalable asynchronous design verification. It combines efficient state-space reductions and novel interface refinement and can dramatically reduce the complexity of state space while decreasing the introduction of false failures. This allows much larger designs to be verified as demonstrated in the experimental results.
 
Multilevel simulation is a technique for specifying and testing VLSI chip designs. This paper compares three methods of implementing such simulators: (a) Programming using a high level language that does not have data abstraction, e.g., Pascal. (b) Programming with procedure oriented data abstractions as in Simula [3] or CLU [4]. (c) Programming with enhanced C's (EC) [2] data abstractions that are macro oriented. LIST -- a generator for simulation programs based on the data abstraction technique is described briefly. It is concluded that the use of data abstractions offers structure, hierarchy and simplicity. The use of EC has the additional advantage of run-time efficiency.
 
Safe transformation 2.
Block diagram for RAPPID circuit.  
We present a method to address state explosion in timed circuit verification by using abstraction directed by the failure model. This method allows us to decompose the verification problem into a set of subproblems, each of which proves that a specific failure condition does not occur. To each subproblem, abstraction is applied using safe transformations to reduce the complexity of verification. The abstraction preserves all essential behaviors conservatively for the specific failure model in the concrete description. Therefore, no violations of the given failure model are missed when only the abstract description is analyzed. An algorithm is also shown to examine the abstract error trace to either find a concrete error trace or report that it is a false negative. We present results using the proposed failure directed abstractions as applied to two large timed circuit designs.
 
Aggregation abstraction is a way of defining a desired correspondence between an implementation of a transaction-oriented protocol and a much simpler idealized version of the same protocol. This relationship can be formally verified to prove the correctness of the implementation. We present a technique for checking aggregation abstractions automatically using a finite-state enumerator. The abstraction relation between implementation and specification is checked on-the fly and the verification requires examining no more states than checking a simple invariant property. This technique can be used alone for verification of finite-state protocols, or as preparation for a more general aggregation proof using a general-purpose theorem-prover. We illustrate the technique on the cache coherence protocol used in the FLASH multiprocessor system
 
In floorplanning, it is common that a designer wants to have certain modules abutting with one another in the final packing. The problem of controlling the relative positions of an arbitrary number of modules in floorplan design is nontrivial. Slicing floorplan has an advantageous feature in which the topological structure of the packing can be found without knowing the module dimensions. This feature is good for handling placement constraints in general. In this paper, we make use of it to solve the abutment problem in the presence of L- and T-shaped modules. This is done by a procedure which explores the topological structure of the packing and finds the neighborhood relationship between every pair of modules in linear time. Our main contribution is a method that can handle abutment constraints in the presence of L- or T-shaped modules in such a way that the shape flexibility of the soft modules can still be fully exploited to obtain a tight packing. We tested our floorplanner with some benchmark data and the results are promising
 
AC testing of integrated circuits and assemblies is gaining importance as the trend continues in favor of fewer defects shipped and use of higher performance technologies. While there is a large body of literature on test generation and fault simulation related to ac test, the optimization of timing on the tester has been unexplored. This paper defines the problems associated with optimization of the test application timing for a class of test equipment. Two approaches to test application timing are introduced. The notion of slack is used to define the objective function for optimization. The optimization problem is shown to be NP-complete even for non-reconvergent fanout circuits. Heuristics are presented for the optimization problems and the results compared with bounds on test circuits.
 
Top-cited authors
Alberto L. Sangiovanni-Vincentelli
  • University of California, Berkeley
Andrew Kahng
  • University of California, San Diego
Krishnendu Chakrabarty
  • Duke University
Jason Cong
  • University of California, Los Angeles
Sudhakar Reddy
  • University of Iowa