# IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Published by Institute of Electrical and Electronics Engineers

Print ISSN: 0278-0070

Published by Institute of Electrical and Electronics Engineers

Print ISSN: 0278-0070

Publications

This paper reports a unified triode/saturation model with an
improved continuity in the output conductance suitable for CAD of VLSI
circuits using deep sub-0.1 μm NMOS devices. As verified by the
experimental data, the model shows an accurate prediction of the output
conductance characteristics

…

Simulation and experimental results are presented for an active-noise-suppression technique to reduce substrate crosstalk in mixed-signal IC technology. The method utilizes a 3-D distributed resistive-capacitive substrate model, along with a BiCMOS wideband differential noise suppression amplifier (NSA) designed in IBM's 0.18-mum 7WL BiCMOS technology. Simulation results for a GR-defined ldquoquietrdquo region predict a noise suppression factor of -6 dB over a frequency range of 10 MHz-2 GHz at the center of the region with a peak suppression of -14 dB at the point of the NSA connection to the guard ring (GR). BF-Moat and P<sup>+</sup> region/deep trench/n-well GR isolation structures were also integrated into the 3-D substrate model, for investigation of their isolation ability. The simulated substrate-noise suppression and isolation results were verified with an experimental test site, designed and fabricated in 7WL technology. Measurements of both noise suppression and isolation factors were compared to the simulation results and to predictions derived from an analytical model.

…

A behavioral model of a 1.8-V, 6-bit flash analog-to-digital
converter has been developed based on device parameters using the
g<sub>m</sub>/I<sub>d</sub> methodology. This approach eliminates the
need for recharacterization of blocks when device sizes are changed.
Furthermore, the performance can be predicted with input only from
device and process simulators eliminating the need for a circuit
simulator and associated model parameters. Signal to noise plus
distortion ratio and differential and integral nonlinearity are
predicted and verified at lower resolution with a circuit
simulator

…

We present a methodology for the watermarking of synchronous
sequential circuits that makes it possible to identify the authorship of
designs by imposing a digital watermark on the state transition graph
(STG) of the circuit. The methodology is applicable to sequential
designs that are made available as firm intellectual property, the
designation commonly used to characterize designs specified as
structural hardware description languages or circuit netlists. The
watermarking is obtained by manipulating the STG of the design in such a
way as to make it exhibit a chosen property that is extremely rare in
nonwatermarked circuits while, at the same time, not changing the
functionality of the circuit. This manipulation is performed without
ever actually computing this graph in either implicit or explicit form.
Instead, the digital watermark is obtained by direct manipulation of the
circuit description. We present evidence that no known algorithms for
circuit manipulation can be used to efficiently remove or change the
watermark and that the process is immune to a variety of other attacks.
We present both theoretical and experimental results that show that the
watermarking can be created and verified efficiently. We also test
possible attack strategies and verify that they are inapplicable to
realistic designs of medium to large complexity

…

In this paper, we introduce PVL, an algorithm for computing the
Pade approximation of Laplace-domain transfer functions of large linear
networks via a Lanczos process. The PVL algorithm has significantly
superior numerical stability, while retaining the same efficiency as
algorithms that compute the Pade approximation directly through moment
matching, such as AWE and its derivatives. As a consequence, it produces
more accurate and higher-order approximations, and it renders
unnecessary many of the heuristics that AWE and its derivatives had to
employ. The algorithm also computes an error bound that permits to
identify the true poles and zeros of the original network. We present
results of numerical experiments with the PVL algorithm for several
large examples

…

In this paper, we propose a new method for the synthesis of 1-D 90/150 linear-hybrid-group cellular automata for CA-polynomials. We obtain large-cell CA very rapidly using our algorithm. This algorithm is efficient and suitable for all practical applications.

…

An interconnect diagnosis scheme based on the oscillation ring (OR) test methodology for systems-on-chip (SOC) design with heterogeneous cores is proposed. In addition to traditional stuck-at and open faults, the OR test can also detect and diagnose important interconnect faults such as delay faults and crosstalk glitches. The large number of test rings in the SOC design, however, significantly complicates the interconnect diagnosis problem. In this paper, the diagnosability of an interconnect structure is first analyzed then a fast diagnosability checking algorithm and an efficient diagnosis ring generation algorithm are proposed. It is shown in this paper that the generation algorithm achieves the maximum diagnosability for any interconnect. Two optimization techniques are also proposed, an adaptive and a concurrent diagnosis method, to improve the efficiency and effectiveness of interconnect diagnosis. Experiments on the MCNC benchmark circuits show the effectiveness of the proposed diagnosis algorithms. In all experiments, the method achieves 100% fault detection coverage and the optimal interconnect diagnosis resolution

…

We propose an interconnect diagnosis scheme based on oscillation ring test methodology for SOC design with heterogeneous cores. The target fault models are delay faults and crosstalk glitches. We analyze the diagnosability of an interconnect structure and propose a fast diagnosability checking algorithm and an efficient diagnosis ring generation algorithm which achieves the optimal diagnosability. Two optimization techniques improve the efficiency and effectiveness of interconnect diagnosis. In all experiments, our method achieves 100% fault coverage and the optimal diagnosis resolution.

…

In this paper, a time and memory-efficient diagnostic fault
simulator for sequential circuits is first presented. A distributed
diagnostic fault simulator is then presented based on the sequential
algorithm to improve the speed of the diagnostic process. In the
sequential diagnostic fault simulator, the number of fault-pair output
response comparisons has been minimized by using an indistinguishability
fault list that stores the faults that are indistinguishable from each
fault. Due to the symmetrical relationship of the fault-pair
distinguishability, fault list sizes are reduced. Therefore, the
different diagnostic measures of a given test set can be generated very
quickly using a small amount of memory. To further speed up the process
of finding the indistinguishable fault list for each fault, a
distributed approach is proposed and developed. The major idea for this
approach is that each processor constructs the indistinguishable fault
lists for a certain percentage of faults only. Experimental results show
that the sequential diagnostic fault simulator runs faster and uses less
memory than a previously developed one and that the distributed
algorithm even achieves superlinear speedup for a very large sequential
benchmark circuit, s35932. To the authors' knowledge, no distributed
diagnostic fault simulation system for sequential circuits has been
proposed before

…

We propose a procedure for generating test sequences for diagnosis
of synchronous sequential circuits based on stuck at faults. In this
procedure, we avoid the conventional fault-oriented test generation
process by observing that a sequence to distinguish two faults can be
obtained from a sequence T that detects both of the faults (such as a
test sequence for fault detection) by changing T so as to
“undetect” one of the faults, or change the time units or
outputs where the fault is detected. To achieve this goal, the proposed
procedure eliminates parts of T so as to render some of the faults
undetected, or change their detection times or outputs. In the case
where faults become undetected by the modified sequence, the detected
faults are distinguished from the faults left undetected by the modified
sequence based on pass/fall information. A pass/fall dictionary based on
modified test sequences is proposed for this case. Alternatively, a
standard dictionary can be used, and the proposed procedure can be used
to change the time units or outputs where faults are detected in order
to distinguish them. We present experimental results to demonstrate the
levels of resolution that can be obtained by the proposed procedure with
the proposed pass/fail dictionary, and the number of sequences required
for this purpose

…

This paper presents a robust global router called NTHU-Route 2.0 that improves the solution quality and runtime of NTHU-Route by the following enhancements: 1) a new history based cost function; 2) new ordering methods for congested region identification and rip-up and reroute; and 3) two implementation techniques. We report convincing experimental results to show the effectiveness of each individual enhancement. With all these enhancements together, NTHU-Route 2.0 solves all ISPD98 benchmarks with very good quality. Moreover, NTHU-Route 2.0 routes 7 of 8 ISPD07 benchmarks and 12 of 16 ISPD08 benchmarks without any overflow. Compared with other state-of-the-art global routers, NTHU-Route 2.0 is able to produce better solution quality and/or run more efficiently.

…

An important phase in the single row routing approach for multilayer printed circuit board routing is the via assignment phase. The via assignment problem, whose objective is to minimize the number of via columns used, is known to be NP-hard even when each net includes at most three vias. In this paper, we develop efficient approximation algorithms for solving this problem. When the size of each net is bounded by three, we present an algorithm which guarantees that the solution generated uses no more than 2OPTC — 1 via columns, and no more than [(4/3)OPTv \ vias, where OPTcand OPTvare the number of via columns and vias in an optimal solution. We then extend our result to the case that the nets have arbitrary sizes. An efficient algorithm is presented which guarantees the solution generated uses no more than [2.5OPTc] via columns and no more than [1.5OPTV] vias in the worst case.

…

The four papers in this special section were originally presented at the IEEE Symposium on Application Specific Processors 2008.

…

The four papers in this special section are extended versions of papers presented at the 3rd ACM/IEEE Symposium on Networks-on-Chip (NOCS) in San Diego, CA, in 2009.

…

This paper describes the first study of the complete sequence from
process simulation to circuit performance and the corresponding
sensitivities for 0.25-μm technology. This is made possible by a
combination of physically based process models and a systematic
calibration involving SIMS, one-dimensional (1-D), and two-dimensional
(2-D) device characteristics. Simulated nFET and pFET characteristics
match hardware (HW) within 5-10% for both long-channel and nominal
length devices. Simulated ring-oscillator performance is in good
agreement with HW data. Sensitivities of device characteristics and the
inverter gate delay to process variations (within 10%) are quantified.
These investigations establish the correlation between process
variations and circuit performance

…

A two-dimensional device simulator SNU-2D based on the
hydrodynamic model is developed for the simulation and analysis of
submicron devices. The simulator has the capacity for both
self-consistent steady-state and transient-state simulation. To obtain
better convergence and numerical stability, we adopt an improved
discretization scheme for the carrier energy flux equation and a new
strategy for the transient simulation. In steady-state simulation the
new discretization scheme shows a considerable improvement in
convergence rate and numerical accuracy compared with the existing
schemes. A transient simulation study is carried out on a deep submicron
n-MOSFET used in the sense amplifier of SRAM cells to investigate the
gate-switching characteristic. It is found that the behavior of carrier
temperature is quasi-static during the switching time even for very fast
switching speed, while the behavior of impact ionization under transient
mode deviates from that under dc mode as the switching speed increases

…

The aliasing probabilities of multiple-input signature registers
(MISR) with m inputs for a 2<sup>m</sup>-ary symmetric channel,
where each of the (2<sup>m</sup>-1) possible errors is equally likely,
are analyzed. For this error model, the aliasing probabilities of MISRs
are analyzed using the weight distributions of
maximum-distance-separable (MDS) codes. The results show that the
aliasing probabilities over the 2<sup>m</sup>-ary symmetric channel do
not depend on the polynomials that characterize the MISRs. That is, for
the 2<sup>m</sup>-ary symmetric channel, the aliasing probability of an
MISR based on a primitive polynomial is exactly the same as one based on
a nonprimitive one. In addition, it is observed that the aliasing
probabilities, P <sub>al</sub> ( n ), as a function of
test length n , are monotonous for error probabilities
p =0.2, 0.4, and 0.8. The aliasing probabilities of multiple
MISRs based on Reed-Solomon codes are analyzed again for the 2<sup>m
</sup>-ary symmetric channel, using the weight distributions of
Reed-Solomon codes, which are MDS codes

…

The application of 2<sup>N</sup> trees to device model
approximation is described. The domain of the device model function is
partitioned using a 2<sup>N</sup> tree, with smaller partitions where
the function is more nonlinear. The function value associated with each
corner of each partition is precomputed, and the function is evaluated
by a given point by interpolation over the smallest partition that
includes that point. This technique has the advantage that highly
nonlinear functions can be modeled with modest space and time
requirements. Exponential functions, such as the subthreshold behavior
of FETs, can be accurately modeled. Accuracy levels of 1% are possible
down to currents of 10<sup>-11</sup> A. Table generation time is small;
it is only a few minutes for a MOSFET model including subthreshold
effects. This algorithm is especially suited for application to a
hardware-accelerated device model evaluator. The design of a prototype
that is capable of performing a device model evaluation of a SPICE
level-2 model including subthreshold effects in 1 μs is described.
Less detailed models, such as timing simulator models, can be evaluated
in as little as 0.2 μs

…

An engineering model of the short-channel NMOS transistor which is
applicable to both room-temperature and cryogenic device operation is
presented. The model incorporates the nonuniversal dependence of the
effective channel mobility on the effective vertical field, which is
ignored in room-temperature device models. Described also is a novel
method to account for the bulk charge effect in the presence of drift
velocity saturation, channel length modulation, charge sharing by the
drain and source, and temperature dependence of the critical field. The
proposed model is verified by comparison with experimental device
characteristics obtained over a wide range of terminal voltages,
temperatures, and channel lengths

…

BELLMAC-32A is a single-chip fully 32-bit high-end microprocessor designed in 2.5-?m twin-tub CMOS technology. This paper describes the gate matrix layout of random control logic in BELLMAC-32A with top-down hierarchical design methodology. The gate matrix layout provided (1) parallel team layout efforts, (2) adaptability to evolving logic design with short turnaround time, (3) high packing density competitive with hand layout, (4) compatibility with computer-aided layout and verification tools, (5) capability to fine-tune circuits, and (6) technology updatability. It took 6.5 engineer-years to complete the layout of random control logic with 7000 transistors although the logic design was continuously evolving during the layout period. The average packing density of gate matrix layout was 1500 ?m2 per transistor in random logic and 840 ?m2 per transistor in data path. BELLMAC-32A had more-than-three times performance improvement over its 3.5 ?m technology prototype chip BELLMAC-32, in which random control logic was implemented with polycells.

…

The paper presents a novel strategy aimed at modeling instruction energy consumption of 32-bit microprocessors. Different from former approaches, the proposed instruction-level power model is founded on a functional decomposition of the activities accomplished by a generic microprocessor. The proposed model has significant generalization capabilities. It allows estimation of the power figures of the entire instruction-set starting from the analysis of a subset, as well as to power characterize new processors by using the model obtained by considering other microprocessors. The model is formally presented and justified and its actual application over five commercial microprocessors is included. This static characterization is the basic information for system-level power modeling of hardware/software architectures.

…

Algorithms for general surface advancement, three-dimensional
visibility, and convolution over a surface have been developed and
coupled with physical models for pattern transfer. The resulting
program, SAMPLE-3D, allows practical simulation of plasma etching and
deposition processes on engineering workstations. The physical models
are 3-D extensions of 2-D string and segment based models. The models
include secondary effects, such as material density variations and
damage enhanced etching. A general facet motion algorithm supports
simple, isotropic, cosine-directional, and general surface orientation
dependent processes. A 3-D grid of rectangular prismatic cells, which is
updated by the advancing surface, contains an alternate topography
representation for fast shadow and visibility calculation. The program
is organized as a collection of modular functions for continued model
and algorithm development. Guidelines for estimating CPU and memory
requirements for various models and simulation cases are based on an
analysis of the algorithms and data structures. Simple processes, such
as lithography development, require 1-5 min of CPU time. Simulations
involving integration over flux distributions, such as plasma etching
and sputter deposition, require from 5-30 min for typical cases.
Reflection or surface migration calculations require from 30-60 min.
Physical memory of 4-32 megabytes is sufficient for many practical
simulations

…

A three-dimensional (3-D) process simulator, OPUS/3D, has been
developed. It has access to two-dimensional (2-D) process simulation
results. Impurity profiles and structural data simulated rigidly in 2-D
space are expanded to 3-D space at an arbitrary stage of the simulation
processes. 3-D simulation follows until the end of the process. The
access to 2-D simulation results enables OPUS/3D to handle curved
boundaries as seen in field oxides. OPUS/3D has three different methods
for expanding 2-D results. Errors due to these expansions are discussed.
OPUS/3D was applied to 3-D simulations of MOS devices at the corner edge
of the active region and in the channel region near the source/drain.
Computation time was drastically reduced by replacing part of the 3-D
simulations by 2-D simulations. Some 3-D effects are confirmed by
OPUS/3D

…

We present PMC-3D, a parallel three-dimensional (3-D) Monte Carlo
device simulator for multiprocessors. The parallel algorithm is an
extension of the standard Monte Carlo device simulation model in 3-D, in
which the particle dynamics generated from the stochastic Monte Carlo
method are solved simultaneously with Poisson's equation on a 3-D mesh
using finite differences. Due to the large computational requirements of
3-D device simulation, it is necessary to parallelize both the Poisson
solver and the Monte Carlo simulation phase of the device simulator. The
parallel algorithms were implemented on a 1024-node distributed memory
nCUBE multicomputer and a 4-mode shared memory Ardent multiprocessor. We
validate the accuracy of our implementations by generating the static
characteristics of a MESFET and present test results on the fixed and
scaled speedups obtained on the two types of parallel computers.
Improvements in performance are observed utilizing dynamic load
balancing for the distributed memory case

…

The 3D packing problem consists of arranging nonoverlapping rectangular boxes (blocks) of given sizes in a rectangular box of minimum volume. As a representation of 3D packings, this paper proposes a novel encoding method called Double Tree and Sequence (DTS). The following are features of DTS: 1) It can represent any minimal packing. 2) It can be decoded into the corresponding 3D packing in O ( n <sup>2</sup>) time, where n is the number of rectangular boxes. 3) The size of the solution space (the number of codes) of DTS is significantly smaller than any conventional representation that can represent any packing. Experimental comparisons with conventional representations indicate the superiority of the proposed representation DTS.

…

This paper presents a novel three-dimensional (3D) thermal
simulation tool for semiconductor integrated devices. The simulator is
used to automatically generate an accurate 3D physical model of the
device to be simulated from layout information. The simulator produces
an appropriate mesh of the device based on a rectangular block
structure. The mesh is automatically created such that a fine mesh is
produced around heat generation regions, but a moderate number of blocks
are used for the entire device. This paper first confirms that the
simulator produces an accurate solution to the nonlinear differential
equation describing the heat flow. Then model generation from three
example technologies (silicon trench, GaAs mesa structures, silicon on
insulator) is presented. The potential of the simulator to quickly and
easily explore the effect of layout and process variations is
illustrated, with the simulation of a two-transistor GaAs power cell as
a large example. The program incorporates a transient solver based on a
transmission line matrix (TLM) implementation using a physical
extraction of a resistance and capacitance network. The formulation
allows for temperature dependent material parameters and a nonuniform
time stepping. An example of a full transient solution of heat flow in a
realistic Si trench device is presented

…

The problem of detection and identification of a faulty processing
element in a systolic array is addressed. A method for designing
processing elements with concurrent error detection is presented. The |
gAN |<sub>M</sub> code is shown to be an effective code for
encoding the operands in a systolic array. It is shown that the | g
3 N |<sub>M</sub> code is equivalent to a residue code with
the check and information bits interchanged, for an odd number of
information bits. This allows arithmetic to be performed separately on
the information and check bits while the output can be checked by an
AN checker. An architecture and rules for designing a
self-checking processing element (PE) for systolic arrays are presented.
Both redundancy and extra delay of the self-checking PE are shown to be
low

…

Programmable logic devices such as field-programmable gate arrays (FPGAs) are useful for a wide range of applications. However, FPGAs are not commonly used in battery-powered applications because they consume more power than application-specified integrated circuits and lack power management features. In this paper, we describe the design and implementation of Pika, a low-power FPGA core targeting battery-powered applications. Our design is based on a commercial low-cost FPGA and achieves substantial power savings through a series of power optimizations. The resulting architecture is compatible with existing commercial design tools. The implementation is done in a 90-nm triple-oxide CMOS process. Compared to the baseline design, Pika consumes 46% less active power and 99% less standby power. Furthermore, it retains circuit and configuration state during standby mode and wakes up from standby mode in approximately 100 ns

…

Recounts the life and career of A. Richard Newton and provides an excerpt of his keynote address at the 1995 Design Automation Conference and an unabridged presentation of his address to the Berkeley EECS Annual Research Symposium on February 23, 2006.

…

A new (AlGa)As/GaAs MODFET integrated circuit simulator is described. Our simulator is a customized version of SPICE incorporating the extended charge control model for MODFET's and the velocity saturation model for ungated FET's used as the load devices. Comparison of our simulator results with the measured data show good agreement for both dc and transient responses. We also propose a set of analytically derived design guidelines for MODFET inverter stages. Design parameters such as the optimum ratio of driver to load saturation currents, noise margins and switching delays can be readily related to the device process parameters. Our simulation results indicate that the inverter speed increases with increasing driver threshold voltages but there is an optimum threshold voltage, of approximately 0.4 V for our devices, which provide the highest noise margin.

…

Alternating-aperture phase shift masking (AAPSM), a form of strong resolution enhancement technology, will be used to image critical features on the polysilicon layer at smaller technology nodes. This technology imposes additional constraints on the layouts beyond traditional design rules. Of particular note is the requirement that all critical features be flanked by opposite-phase shifters while the shifters obey minimum width and spacing requirements. A layout is called phase assignable if it satisfies this requirement. Phase conflicts have to be removed to enable the use of AAPSM for layouts that are not phase assignable. Previous work has sought to detect a suitable set of phase conflicts to be removed as well as correct them. This paper has two key contributions: 1) a new computationally efficient approach to detect a minimal set of phase conflicts, which when corrected will produce a phase-assignable layout, and 2) a novel layout modification scheme for correcting these phase conflicts with small layout area increase. Unlike previous formulations of this problem, the proposed solution for the conflict detection problem does not frame it as a graph bipartization problem. Instead, a simpler and more computationally efficient reduction is proposed. This simplification greatly improves the runtime while maintaining the same improvements in the quality of results obtained in Chiang (Proc. DATE, 2005, p. 908). An average runtime speedup of 5.9times is achieved using the new flow. A new layout modification scheme suited for correcting phase conflicts in large standard-cell blocks is also proposed. The experiments show that the percentage area increase for making standard-cell blocks phase assignable ranges from 1.7% to 9.1%

…

In this paper, the transient response of arbitrarily terminated
nonuniform transmission lines with frequency-dependent parameters is
analyzed by the introduction of ABCD matrices and the waveform
relaxation (WR) method. A differential equation describing the ABCD
matrices of nonuniform transmission lines is derived and then solved
efficiently with the WR method. A convergence theorem is proven,
according to which the nonuniform transmission line is segmented into a
number of cascaded subnetworks to increase the convergence speed. An
example of nonuniform transmission system is analyzed. The results are
comparable to that of the convolution-characteristics method

…

With continuous advances in radio-frequency (RF) mixed-signal very large scale integration (VLSI) technology, the creation of eddy currents in lossy multilayer substrates has made the already complicated interconnect analysis and modeling issue more challenging. To account for substrate losses, traditional electromagnetic methods are often computationally prohibitive for today's VLSI geometries. In this paper, an accurate and efficient interconnect modeling approach-the eddy-current-aware partial equivalent element circuit (EPEEC)-is proposed. Based on complex image theory, it extends the traditional partial equivalent element circuit (PEEC) model to simultaneously take multilayer substrate eddy-current losses and frequency-dependent effects into consideration. To accommodate even larger scale on-chip interconnect networks, EPEEC develops a new simulation program with integrated circuit emphasis (SPICE)-compatible reluctance extraction algorithm by applying sparsification in the inverse inductance domain with an extended window algorithm. Compared with several industry standard inductance and full-wave solvers, such as FastHenry and Sonnet, EPEEC demonstrates within 1.5% accuracy while providing over 100× speedup.

…

The design of checkers aimed at the concurrent test of analog and
mixed-signal circuits is considered in this paper. These checkers can
on-line test duplicated and fully differential analog circuits. The test
approach is based on exploiting the inherent redundancy of these
circuits which results in the use of a code for the analog signals. The
analog code is monitored by the checkers. An error signal which complies
with existing digital self-checking parts is generated in the case that
a code fails out of the valid code space. For the verification of the
analog codes, absolute tolerance margins and tolerance margins which are
made relative to signal amplitude are considered. A test pattern
generator for off-line testing of the checkers is proposed

…

As a first step, most model checkers used in the hardware industry convert a high-level register-transfer-level (RTL) design into a netlist. However, algorithms that operate at the netlist level are unable to exploit the structure of the higher abstraction levels and, thus, are less scalable. The RTL of a hardware description language such as Verilog is similar to a software program with special features for hardware design such as bit-vector arithmetic and concurrency. This paper uses predicate abstraction, a software verification technique, for verifying RTL Verilog. There are two challenges when applying predicate abstraction to circuits: 1) the computation of the abstract model in presence of a large number of predicates and 2) the discovery of suitable word-level predicates for abstraction refinement. We address the first problem using a technique called predicate clustering. We address the second problem by computing the weakest preconditions of Verilog statements in order to obtain new word-level predicates during abstraction refinement. We compare the performance of our technique with localization reduction, a netlist-level abstraction technique, and report improvements on a set of benchmarks.

…

SystemC's growing community for system-level design exploration is a result of SystemC's capability of modeling at register transfer level (RTL) and above RTL abstraction levels. However, a synthesis path from SystemC at abstraction layers above RTL is still in its infancy. A recent extension of SystemC, which is called Bluespec-SystemC electronic system level (BS-ESL), counters this difficulty with its model of computation employing atomic rule-based specifications and synthesis to Verilog. In order to simulate a model consisting of one part designed in SystemC and another using BS-ESL, we require an interoperability semantics and implementation of such a semantics. To illustrate the problem, we formalize the simulation semantics of BS-ESL and discrete-event simulation of RTL SystemC, and provide a solution based on this formalization.

…

We propose an abstraction refinement method for invariant checking, which is based on the simultaneous analysis of all abstract counter examples of shortest length in the current abstraction. The algorithm is focused on an improved Ariadne's Bundle<sup>1</sup> of SORs (Synchronous Onion Rings) of the abstract model; the transitions through these SORs contain all shortest ACEs (Abstract Counter Examples) and no other ACEs. The SORs are exploited in two distinct ways to provide global guidance to the abstraction refinement process: (1) Refinement variable selection is based on the entirety of transitions connecting the SORs, and (2) a SAT-based concretization test is formulated to test all ACEs in the SORs at once. We call this test multi-thread concretization. The scalability of our refinement algorithm is ensured in the sense that all the analysis and computation required in our refinement algorithm are conducted on the abstract model. The abstraction efficiency of a given abstraction refinement algorithm measures how much of the concrete model is required to make the decision. We include experimental comparisons of our new method with recently published techniques. The results show that our scalable method, based on global guidance from the entire bundle of shortest ACEs, outperforms these other methods in terms of both run time and abstraction efficiency.

…

We present a network interface (NI) for an on-chip network. Our NI decouples computation from communication by offering a shared-memory abstraction, which is independent of the network implementation. We use a transaction-based protocol to achieve backward compatibility with existing bus protocols such as AXI, OCP, and DTL. Our NI has a modular architecture, which allows flexible instantiation. It provides both guaranteed and best-effort services via connections. These are configured via NI ports using the network itself, instead of a separate control interconnect. An example instance of this NI with four ports has an area of 0.25 mm<sup>2</sup> after layout in 0.13-μm technology, and runs at 500 MHz.

…

We describe new techniques for model checking in the counterexample-guided abstraction-refinement framework. The abstraction phase "hides" the logic of various variables, hence considering them as inputs. This type of abstraction may lead to "spurious" counterexamples, i.e., traces that cannot be simulated on the original (concrete) machine. We check whether a counterexample is real or spurious with a satisfiability (SAT) checker. We then use a combination of 0-1 integer linear programming and machine learning techniques for refining the abstraction based on the counterexample. The process is repeated until either a real counterexample is found or the property is verified. We have implemented these techniques on top of the model checker NuSMV and the SAT solver Chaff. Experimental results prove the viability of these new techniques.

…

The major barrier that prevents the application of formal verification to large designs is state explosion. This paper presents a new approach for verification of timed circuits using automatic abstraction. This approach partitions the design into modules, each with constrained complexity. Before verification is applied to each individual module, irrelevant information to the behavior of the selected module is abstracted away. This approach converts a verification problem with big exponential complexity to a set of subproblems, each with small exponential complexity. Experimental results are promising in that they indicate that our approach has the potential of completing much faster while using less memory than traditional flat analysis.

…

This paper proposes a novel approach to solve the allocation and scheduling problems for variable voltage/frequency multiprocessor systems-on-chip, which minimizes overall system energy dissipation. The optimality of derived system configurations is guaranteed, while the computation efficiency of the optimizer allows for solving problem instances that were traditionally considered beyond reach for exact solvers (optimality gap). Furthermore, this paper illustrates the development- and run-time software infrastructures that assist the user in developing applications and implementing optimizer solutions. The proposed approach guarantees a high level of power, performance, and constraint satisfaction predictability as from validation on the target platform, thus bridging the abstraction gap.

…

In this paper we propose a data structure for abstracting the
delay information of a combinatorial circuit. The particular abstraction
that we are interested in is one that preserves the delays between all
pairs of inputs and outputs in the circuit. Such abstractions are useful
when considering the delay of cascaded circuits in high-level synthesis
and other such applications in synthesis. The proposed graphical data
structure is called the concise delay network, and is of size
proportional to (m+n) in best case, where m and n refer to the number of
inputs and outputs of the circuit. In comparison, a delay matrix that
stores the maximum delay between each input-output pair has size
proportional to m×n. For circuits with hundreds of inputs and
outputs, this storage and the associated computations become quite
expensive, especially when they need to be done repeatedly during
synthesis. We present heuristic algorithms for deriving these concise
delay networks. Experimental results shows that, in practice, we can
obtain concise delay network with the number of edges being a small
multiple of (m+n)

…

In this paper, an algorithm is presented for the verification of the equivalence of two sequential circuit descriptions at the same or differing levels of abstraction, namely at the register-transfer (RT) level and the logic level. The descriptions represent general finite automata at the differing levels -- a finite automaton can be described in a ISP-like language and its equivalence to a logic level implementation can be verified using our algorithm. Two logic level automatons can be similarly verified for equivalence. Previous approaches to sequential circuit verification have been restricted to verifying relatively simple descriptions with small amounts of memory. Unlike these approaches, our technique is shown to be computationally efficient for much more complex circuits. The efficiency of our algorithm lies in the exploitation of don't care information derivable from the RTL or logic level description (e.g invalid input and output sequences) during the verification process. Using efficient cube enumeration procedures at the logic level we have been able to verify the equivalence of finite automata with a large number of states in small amounts of cpu-time.

…

Design debugging is one of the major remaining manual processes in the semiconductor design cycle. Despite recent advances in the area of automated design debugging, more effort is required to cope with the size and complexity of today's designs. This paper introduces an abstraction and refinement methodology to enable current debuggers to operate on designs that are orders of magnitude larger than otherwise possible. Two abstraction techniques are developed with the goals of improving debugger performance for different circuit structures: State abstraction is aimed at reducing the problem size for circuits consisting purely of primitive gates, while function abstraction focuses on designs that also contain modular and hierarchical information. In both methods, after an initial abstracted model is created, the problem can be solved by an existing automated debugger. If an error site is abstracted, refinement is necessary to reintroduce some of the abstracted components back into the design. This paper also presents the underlying theory to guarantee correctness and completeness of a debugging tool that operates using the proposed methodology. Empirical results demonstrate improvements in run time and memory capacity of two orders of magnitude over a state-of-the-art debugger on a wide range of benchmark and industrial designs.

…

This paper presents a compositional method with failure-preserving abstraction for scalable asynchronous design verification. It combines efficient state-space reductions and novel interface refinement and can dramatically reduce the complexity of state space while decreasing the introduction of false failures. This allows much larger designs to be verified as demonstrated in the experimental results.

…

Multilevel simulation is a technique for specifying and testing VLSI chip designs. This paper compares three methods of implementing such simulators: (a) Programming using a high level language that does not have data abstraction, e.g., Pascal. (b) Programming with procedure oriented data abstractions as in Simula [3] or CLU [4]. (c) Programming with enhanced C's (EC) [2] data abstractions that are macro oriented. LIST -- a generator for simulation programs based on the data abstraction technique is described briefly. It is concluded that the use of data abstractions offers structure, hierarchy and simplicity. The use of EC has the additional advantage of run-time efficiency.

…

We present a method to address state explosion in timed circuit verification by using abstraction directed by the failure model. This method allows us to decompose the verification problem into a set of subproblems, each of which proves that a specific failure condition does not occur. To each subproblem, abstraction is applied using safe transformations to reduce the complexity of verification. The abstraction preserves all essential behaviors conservatively for the specific failure model in the concrete description. Therefore, no violations of the given failure model are missed when only the abstract description is analyzed. An algorithm is also shown to examine the abstract error trace to either find a concrete error trace or report that it is a false negative. We present results using the proposed failure directed abstractions as applied to two large timed circuit designs.

…

Aggregation abstraction is a way of defining a desired
correspondence between an implementation of a transaction-oriented
protocol and a much simpler idealized version of the same protocol. This
relationship can be formally verified to prove the correctness of the
implementation. We present a technique for checking aggregation
abstractions automatically using a finite-state enumerator. The
abstraction relation between implementation and specification is checked
on-the fly and the verification requires examining no more states than
checking a simple invariant property. This technique can be used alone
for verification of finite-state protocols, or as preparation for a more
general aggregation proof using a general-purpose theorem-prover. We
illustrate the technique on the cache coherence protocol used in the
FLASH multiprocessor system

…

In floorplanning, it is common that a designer wants to have
certain modules abutting with one another in the final packing. The
problem of controlling the relative positions of an arbitrary number of
modules in floorplan design is nontrivial. Slicing floorplan has an
advantageous feature in which the topological structure of the packing
can be found without knowing the module dimensions. This feature is good
for handling placement constraints in general. In this paper, we make
use of it to solve the abutment problem in the presence of L- and
T-shaped modules. This is done by a procedure which explores the
topological structure of the packing and finds the neighborhood
relationship between every pair of modules in linear time. Our main
contribution is a method that can handle abutment constraints in the
presence of L- or T-shaped modules in such a way that the shape
flexibility of the soft modules can still be fully exploited to obtain a
tight packing. We tested our floorplanner with some benchmark data and
the results are promising

…

AC testing of integrated circuits and assemblies is gaining importance as the trend continues in favor of fewer defects shipped and use of higher performance technologies. While there is a large body of literature on test generation and fault simulation related to ac test, the optimization of timing on the tester has been unexplored. This paper defines the problems associated with optimization of the test application timing for a class of test equipment. Two approaches to test application timing are introduced. The notion of slack is used to define the objective function for optimization. The optimization problem is shown to be NP-complete even for non-reconvergent fanout circuits. Heuristics are presented for the optimization problems and the results compared with bounds on test circuits.

…

Top-cited authors