Efficient Power Network Analysis Considering Multidomain Clock Gating

Wanping Zhang, Student Member, IEEE, Wenjian Yu, Member, IEEE, Xiang Hu, Student Member, IEEE, Ling Zhang, Rui Shi, He Peng, Student Member, IEEE, Zhi Zhu, Lew Chua-Eoan, Member, IEEE, Rajeev Murgai, Member, IEEE, Toshiyuki Shibuya, Member, IEEE, Noriyuki Ito, and Chung-Kuan Cheng, Fellow, IEEE

Abstract—In this paper, an efficient framework is proposed to analyze the worst case of voltage variation of power network considering multidomain clock gating. First, a frequency-domain-based simulation method is proposed to obtain the time-domain voltage response. With the vector fitting technique, the frequency-domain responses are approximated by a partial fraction expression, which can be easily converted to a time-domain waveform. Then, an algorithm is proposed to find the worst-case voltage variation and corresponding clock gating patterns, through superimposing the voltage responses caused by all domains working separately. The major computation of the whole framework is solving the frequency-domain equation system, whose complexity is about $O(N^2 D \log f_{\text{max}})$, where $N$ is between one and two if using an iterative solver from the PETSc library. $N$ is the number of nodes, $f_{\text{max}}$ is the upper bound of frequency, and $D$ is the number of clock domains. Numerical results show that the proposed simulation method is up to several hundred times faster than commercial fast simulators, like HSPICE and MOPHSPICE. In addition, the proposed method is able to analyze large-scale power networks that the commercial tools are not able to afford.

Index Terms—Frequency-domain analysis, multidomain clock gating, power network, vector fitting (VF), worst-case voltage variation.

I. INTRODUCTION

POWER GROUND (P/G) networks supply power from the P/G pads on a chip to the circuit modules [1]. With the rapid increase of working frequency and continuous scaling of very large scale integration technology, it is becoming more and more important to analyze power networks efficiently and accurately.

For power network analysis, two kinds of voltage noise need to be considered: IR drop caused by the power grid resistance and simultaneous switching noise ($Ldi/dt$ drop) due to on-chip and packaging inductances [1], [2]. The voltage variation leads to adverse impacts on chip, package, and board performance, such as longer time delay or logic failure. The noise problem may be even worse when the frequency of input source matches the natural frequency of the power network. Therefore, an efficient simulation method is needed not only to predict the maximum voltage variation but also to reveal the natural frequency information for resonance avoidance.

Many previous works focused on the efficient time-domain transient analysis of large-scale power networks. In some works, the circuit size is reduced by using methods such as circuit partitioning [3], multigridlike technique [4], and hierarchical model reduction [5]. In others, the circuit simulation is accelerated by fast linear equation solvers. They include the direct solver “KLU” [6], iterative solvers like the preconditioned conjugate gradient method [7], and generalized minimal residual method [8].

On the other hand, a lot of work has been done to predict the worst-case voltage variation of power network. Bai et al. [9] proposed MIMAX algorithm to generate a tight upper bound on the maximum macroblock current envelope, which leads to a maximum voltage drop. Shi et al. [10] introduced an algorithm to predict the worst-case logical timing correlations among the cells which cause the voltage resonance. Lin et al. [11] proposed a full-chip vectorless approach for dynamic power integrity analysis.

The analysis of voltage variation becomes more complicated for the low-power circuits with multiple clock domains. The technique of multidomain clock gating has been used to reduce unnecessary power dissipation, by disabling the clock signals for some modules [12]. However, certain clock-gating patterns may induce the voltage resonance of the power network. Predicting the worst-case voltage variation caused by multidomain clock gating is not only necessary but also a challenge for the power integrity analysis of multi-clock-domain circuits.

In this paper, we propose an efficient framework to analyze the power network considering multidomain clock gating.
The worst-case clock-gating patterns and the corresponding maximum voltage variation are predicted. The major contributions of this paper are as follows.

1) A time-domain simulation method based on frequency-domain analysis and vector fitting (VF) [13] is proposed. With the VF technique, the frequency-domain responses are approximated by a partial fractional expression. Then, it can be easily converted to the time-domain transient waveform. With the techniques of log-scale frequency sampling and efficient iterative linear equation solver, the simulation method is orders of magnitude faster than the conventional time-domain simulation methods, while preserving sufficient accuracy.

2) An algorithm is proposed to predict the maximum voltage variation of power network for circuits with multidomain clock gating. The algorithm utilizes the time-domain voltage response corresponding to a single clock domain working for one cycle, which is simulated with the proposed frequency-domain-based method. With the voltage responses from all individual domains, the worst-case clock patterns and the corresponding maximum voltage variation are predicted using the principle of superposition. This algorithm is of linear computational complexity and is able to analyze large-scale power networks with more than one million nodes.

The rest of this paper is organized as follows. The problem statement is given in Section II. In Section III, a time-domain simulation based on the frequency-domain analysis and the VF technique is introduced. The algorithm to predict the worst-case clock-gating patterns for power network with multidomain clock gating is presented in Section IV. The last two sections include the numerical results and conclusions. Some preliminary results of this paper were presented in [18]. We extend it with more technical details and more numerical results, including that from parallelizing the proposed method. The errors of presentation in [18] are also corrected.

II. POWER NETWORK WITH MULTIDOMAIN CLOCK GATING

The power network is usually modeled as a circuit, including resistance, capacitance, and packaging inductance, like that shown in Fig. 1. Time-varying current sources are connected to some circuit nodes, characterizing the behavior of active circuit instances. These current sources draw current from the power network and cause voltage fluctuations [11]. The waveform of the current source is usually described as a piecewise linear (PWL) function.

In the low-power design with multiple clock domains, the circuit instances belong to different clock domains. The circuit instances within the same clock domain work synchronously. Each domain is governed by one clock controlling signal. Fig. 1 shows such a configuration with four clock domains. The value of clock signal indicates whether the instances in the domain work or sleep at current clock cycle, and the sequence of clock
signal is called clock-gating pattern. Bit “1” in the pattern means that the instances in the domain work for this cycle, while bit “0” represents the sleep mode of the instances. The clock-gating pattern affects the voltage fluctuation of power network, because it determines the behavior of current sources in the model.

The main purpose of this paper is to determine the clock-gating patterns for all domains that cause the maximum voltage variation at given nodes of power network. In the following, we summarize the assumptions taken in this paper.

1) The current profiles of current sources are known and described as PWL functions.
2) We assume that a current source has the same waveform for different working cycles. This scenario is supposed to correspond to the worst case of voltage fluctuation.
3) If a circuit instance is under the sleep mode, the corresponding current source is assumed to have zero current. This assumption omits the leakage current, but the proposed method can be easily extended to consider it.
4) The clock-gating patterns for different domains are independent from each other.

To consider the influence of multidomain clock gating on supply voltage variation, we divide the task into two steps. First, the time-domain voltage response of the power network is simulated with a single clock domain working for one cycle. Then, with the simulation results of all clock domains, we propose an algorithm to find the maximum voltage variation and corresponding worst-case clock-gating patterns. Fig. 2 shows the whole analysis flow. In addition, the two steps are introduced in Sections III and IV, respectively.

III. TIME-DOMAIN SIMULATION BASED ON FREQUENCY-DOMAIN ANALYSIS AND VF

In this section, a method based on the frequency-domain analysis and VF technique is proposed to calculate the time-domain voltage waveform. This method is used to obtain the voltage response at a given node of power network, with a single clock domain working for one cycle.

Fig. 2. Whole analysis flow for the power network with multidomain clock gating.

Fig. 3. Proposed method for time-domain simulation.

A. Basic Idea

Fig. 3 describes the flow of the frequency-domain-based simulation method. We first convert the current sources from a time-domain waveform to a frequency-domain expression with Laplace transform. Since each input current source $I(t)$ is described as a PWL function, its frequency-domain expression can be derived analytically. Then, a linear equation system $A(s)V(s) = I(s)$ is formulated for frequency-domain analysis. After solving the frequency-domain equation, we obtain the voltage response at a specified frequency. The VF technique is adopted to fit the voltages $V(s)$ at frequency samples with a partial fractional expression $\tilde{v}(s)$. Finally, the partial fractional expression can be easily converted to the time-domain waveform $v(t)$.

With little sacrifice on accuracy, this method provides an alternative for time-domain transient simulation. Since it is based on frequency-domain analysis, one can easily obtain the natural frequency information of the power network, which is useful for comprehensive knowledge of power noise. Furthermore, with efficient techniques discussed hereinafter, this method demonstrates large speedup comparing to the conventional time-domain simulation for large-scale power networks.

B. Laplace Transform of Input Current Source

We apply Laplace transform to the PWL function. Suppose that $r(t)$ denotes the unit ramp function

$$r(t) = tu(t)$$

where $u(t)$ is the unit step function. The frequency-domain expression of the ramp function is

$$R(s) = \frac{1}{s^2}.$$  \hspace{1cm} (2)

A PWL function $f(t)$ can be regarded as the superposition of several ramp functions, as shown in Fig. 4

$$f(t) = \sum_{i} a_i r(t - t_i)$$

where $t_i$ is the starting time point of the $i$th ramp segment and $a_i$ is the difference between the slopes of two adjacent segments.
Numerical results show that a power network with more than one million nodes can be easily analyzed by the efficient solver from PETSc.

To describe the complete spectrum of a voltage response, we need to choose some frequency sampling points. For each frequency sample, (5) is solved to get the voltage response. The highest frequency in the spectrum is related with the input current sources and the nature of power network. In practical applications, the upper bound of frequency spectrum is usually not more than several tens of gigahertz. Then, the logarithmic scale sampling is adopted to make a moderate value of frequency samples. The number of frequency sample is obtained by an empirical formula based on a lot of testing of industrial power networks, to make the tradeoff of accuracy and efficiency. With this technique, the number of frequency points is of $O(\log f_{\text{max}})$, where $f_{\text{max}}$ is the upper bound of frequency. We choose an adequate amount of points in each frequency decade.

D. Convert the Frequency-Domain Response to Time-Domain Waveform

The VF technique is a general method for the fitting of frequency-domain responses with rational function approximations [13]. It converts a nonlinear problem of least square approximation to a linear problem in two stages, where the pole locations are determined in an iterative manner. In addition, it is guaranteed that the resulting approximation has stable poles. The VF technique has been developed into a robust numerical package shared in public domain [15], [16].

With the frequency-domain responses at a given node, the VF technique is used to fit the voltage points with a partial fractional expression

$$\bar{v}_k(s) = \sum_{i=1}^{N_a} \frac{r_i}{s - p_i}$$

where $\bar{v}_k$ stands for the voltage at node $k$. Residues $r_i$'s and poles $p_i$'s are obtained with the VF algorithm and either are real quantities or come in complex conjugate pairs. With (6), the time-domain response can be easily derived

$$v_k(t) = \sum_{i=1}^{N_a} \left[ r_i e^{p_i t} u(t) \right].$$

The major computation in VF is to solve a linear least square (LLS) problem, whose coefficient matrix is of $2m \times 2N_a$. Here, $m$ is the number of frequency samples and $N_a$ is the order of approximation. The LLS problem can be solved by the method of normal equation, or the QR decomposition, whose computational complexity is $O(mN_a^2)$. Usually, the response of power network does not include many resonance peaks, and a low order $N_a$ could give the approximation with sufficient accuracy.

E. Computational Complexity and Discussion

In the frequency-domain-based simulation method, the computational time is mostly spent on solving the
frequency-domain equation and performing the VF. The time complexity for solving the frequency-domain linear equation system is about $O(N^\alpha \log f_{\text{max}})$, where $N$ is the node number of the power network and item $\log f_{\text{max}}$ represents the number of frequency samples. We assume that the complexity of solving one equation is $O(N^\alpha)$, where $\alpha$ is a quantity between one and two if using the efficient CGS solver. The time complexity of VF is $O(N^2 \log f_{\text{max}})$, where $N_a$ is the order of approximation.

For large-scale power network, the time for solving an equation dominates the total computational time, because the node number $N$ is much larger than $N_a$. If the voltage responses of multiple nodes on power network are considered, the time for VF will be multiplied by the number of output nodes $N_{\text{out}}$. For analysis of maximum voltage variation, only some nodes at the lowest level of P/G grid are considered. Therefore, $N_{\text{out}}$ is a small number.

The computational time of the frequency-domain-based simulation method is not related with the number of time steps in a conventional transient simulation. Furthermore, since the nodal analysis approach is sufficient for generating the frequency-domain equations, solving each linear equation system is easier than that in conventional transient simulation. The latter usually involves larger linear equation system generated with the modified nodal analysis. In addition, the proposed simulation method can be easily parallelized. Because solving (5) for frequency samples is independent from each other, the work can be distributed to multiple processors. These three points indicate the advantage of the proposed method over the conventional frequency-domain simulation methods. The numerical results in Section V-B validate the aforementioned analysis.

IV. FIND THE WORST CASE OF VOLTAGE VARIATION CONSIDERING MULTIDOMAIN CLOCK GATING

In the following, we first derive the voltage response for an arbitrary clock-gating pattern, using the response corresponding to the current sources working for one cycle. Then, the algorithm to predict the maximum voltage variation considering multidomain clock gating is presented.

A. Voltage Response for an Arbitrary Clock-Gating Pattern

We first consider the situation where there is only one clock domain. Suppose that all current sources only work for the first clock cycle. The voltage of power network will fluctuate for several cycles before reaching a steady state, due to the resonance in circuit. We use $y_0(t)$ to denote the voltage response at a given node. For the situation where the current sources work for multiple cycles with an arbitrary clock-gating pattern, the voltage response can be derived using $y_0(t)$ and the principle of superposition.

Suppose that $f_i(t)$ denotes the first-cycle waveform of the $i$th current source. Its waveform within the first $k$ cycles can be expressed as

$$g_i(t) = \sum_{l=0}^{k-1} b_l f_i(t - lT), \quad i = 1, \ldots, N_s$$

where $T$ is the clock cycle time and sequence $\{b_l\}$ represents the clock-gating pattern. $N_s$ is the total number of current sources. If the clock domain is enabled at the $l$th cycle, $b_l = 1$, otherwise $b_l = 0$.

Because all current sources in the domain work synchronously and the power network is a linear circuit, the voltage response corresponding to the arbitrary clock-gating pattern becomes

$$y(t) = \sum_{l=0}^{k-1} b_l y_0(t - lT).$$

If $y_0(t)$ reaches its steady state after $n$ cycles, $l$ in (9) needs to satisfy $0 < t - lT < nT$ to contribute a nonzero value to the summation. That is

$$\frac{t}{T} - n < l < \frac{t}{T}.$$  \hspace{1cm} (10)

This means that we just need to check at most $n$ bits of clock signal (value of $b_l$) for calculating $y(t)$. For the voltage response during the $k$th cycle, these bits are the controlling signal for the $k$th cycle and the preceding $n - 1$ cycles.

Fig. 5 shows an example of $y_0(t)$. Supposing that one clock cycle is 5 ns, we find out that the waveform takes six cycles to reach the steady state. For this example, we depict the waveforms for the six cycles separately and arrange from top to bottom in Fig. 6. According to (9) and (10), the voltage response $y(t)$ within a given clock cycle can be obtained by selectively superimposing these waveforms. To generate the voltage response during a specified cycle, the six sequent bits of clock signal are needed, which correspond to the six waveforms in Fig. 6, respectively. For each enabled clock bit, the corresponding waveform is kept. Finally, summing up all kept waveforms together gives the result of $y(t)$ for the specified cycle.

The earlier derivation only considers the current sources. The obtained waveform $y(t)$ needs to be added with the initial value of voltage to consider the effect of supply voltage source.
B. Find the Maximum Voltage Variation Considering Multidomain Clock Gating

We first consider the problem with only one clock domain. With above deduction, we know that the output \( y(t) \) is calculated by superimposing different portions of the waveform \( y_0(t) \). Given a time point, if only portions having positive voltage at this point are selected, the superimposed result \( y(t) \) must reach the largest value at this point. Then, sweeping all time points in one cycle with the above manipulation, we can find the maximum positive voltage variation and the corresponding clock-gating pattern. The situation is similar for finding the maximum negative voltage variation, where we select the waveform portions contributing a negative voltage.

For the problem with multiple clock domains, the aforementioned strategy is still valid with little modification. Suppose that \( y_i(t), i = 1, \ldots, D \), denotes the voltage response at the given node if only current sources in the \( i \)th domain work for the first cycle while other domains are sleeping. Here, \( D \) is the number of clock domains. Then, the voltage response \( y(t) \) corresponding to the arbitrary clock-gating pattern becomes

\[
y(t) = \sum_{i=1}^{D} \sum_{l=0}^{k-1} b_i^{(l)} y_i(t - lT) \tag{11}
\]

where \( b_i^{(l)} \) is the \( l \)th bit of the clock-gating pattern for the \( i \)th domain. If each \( y_i(t) \) takes \( n \) cycles to reach the steady state, we need to arrange all \( n \cdot D \) waveform portions in the manner shown in Fig. 6. Then, the maximum voltage variation can be found like what is done for the single-domain problem.

To describe the algorithm of finding the maximum voltage variation considering the multidomain clock gating, we list the relevant parameters in Fig. 7. In addition, the algorithm for the maximum positive voltage variation is shown in Fig. 8. It is straightforward to give a similar algorithm description for finding the maximum negative voltage variation.

The computational complexity of the algorithm in Fig. 8 is about \( O(N_p n D) \), where \( N_p \) is the number of discrete time points within interval \([0, T]\) and \( n \) is the average number of cycles for \( y_i(t) \) to saturate. The value of \( N_p \) is affected by the desired precision of the obtained time-domain waveform and is usually much larger than \( n \) or \( D \). With this analysis, we know that the algorithm finding the worst-case voltage variation has the linear computational complexity. Moreover, the algorithm consumes little time because it does not involve any complex calculation.

For the whole analysis flow including simulating voltage response and finding the worst-case clock-gating patterns, the dominant computational complexity is about \( O(N^\alpha D \log f_{\max}) \), where \( \alpha \) is a quantity between one and two.
V. NUMERICAL RESULTS

The proposed simulation method is implemented in C language. The CGS solver from PETSc [14] is used to solve the frequency-domain circuit equation (5), with an ILU preconditioner. A Matlab program is written to take in the frequency-domain responses and convert them to the time-domain voltage waveform with the help of VF [15]. The algorithm in Section IV is implemented with Matlab, which utilizes the simulation results to obtain the worst-case voltage variation and corresponding clock-gating patterns. A parallel program using a message passing interface is also implemented to show the parallelizability of the proposed simulation method.

We first demonstrate the accuracy of the proposed frequency-domain-based simulation method. Then, the numerical results showing the efficiency of the proposed method are presented, including the comparison with two commercial simulators: HSPICE and MSPICE. MSPICE is a fast SPICE simulator from Fastrack, which utilizes an iterative equation solver and is claimed to be two to ten times faster than other SPICE simulators [17]. Finally, the results of the worst-case analysis of power network with multidomain clock gating are presented. The test cases of power network are provided by our industry partner. They are of mesh structure, similar to that in Fig. 1, including \( R, L, C \) elements and current sources. All experiments are run on a four-core machine with 16-GB memory. Each core has a 3.0-GHz Intel Xeon processor.

A. Accuracy of the Proposed Simulation Method

With one of the test cases, we demonstrate the accuracy of our proposed simulation method. For this case, the upper bound of frequency \( f_{\text{max}} \) is set to 4 GHz, and the number of frequency samples is 36. The frequency-domain responses are fitted with the VF technique, where the fitting order \( N_a \) is nine. Fig. 9 shows the result of VF for one output voltage. The root-mean-square error is found to be \( 4.6 \times 10^{-12} \), which means that the frequency-domain response is well approximated by a partial fractional function. In Fig. 10, the time-domain voltage waveform converted from the partial fractional expression is compared with that obtained from transient simulation of HSPICE. The waveforms from both methods match very well.
Let $V_i$ denote the voltage simulated from HSPICE at the $i$th time sampling point, and $\hat{V}_i$ is the corresponding voltage simulated from the proposed method. We utilize the 1-norm $|·|_1$ to measure the average error ratio (AER) of the voltage response waveform

$$AER = \frac{\sum |\hat{V}_i - V_i|}{\sum |V_i|}. \quad (12)$$

For the accuracy on the maximum voltage drop, the peak error ratio (PER) is defined as

$$PER = \frac{\max (|\hat{V}_i - V_i|)}{\max (|V_i|)}. \quad (13)$$

The accuracy of the proposed simulation method relies on the number of sampling frequency points. The more frequency samples, the more accuracy will be achieved. We manually vary the number of frequency samples from 28 to 40 and draw the corresponding waveforms in Fig. 10. We can see that the waveform with fewer frequency samples has less accuracy. The relative errors (AER and PER) for these waveforms are shown in Fig. 11, versus the number of frequency points. This figure shows good accuracy of the proposed simulation method and verifies the correlation between the accuracy and the number of frequency points.

The computational time of the proposed simulation method is proportional to the number of frequency points. In Fig. 11, the curve of CPU time is also plotted with “star” marks.

### B. Efficiency of the Proposed Simulation Method

Seven test cases of power network with the node number ranging from 5678 to above one million are used to demonstrate the efficiency of the proposed simulation method. The simulation time is compared with those of HSPICE and MSPICE, as listed in Table I. The time for proposed method just includes that for solving the frequency-domain equation on the frequency samples. Because only additional 0.2 s per output node is needed for the VF and converting to time-domain waveform, the total CPU time of the proposed method would be a little more than that in Table I. From Table I, we see that the proposed method is about 100 times faster than HSPICE and the speedup to MSPICE is about ten or more. For large test cases, the speedup ratios are larger. The AER and PER of the voltage responses obtained from the proposed method are also listed in Table I. They show that the errors are all less than 1%; thus, the proposed method has good accuracy.

<table>
<thead>
<tr>
<th>Name of test case</th>
<th># nodes</th>
<th>Time of proposed method (s)</th>
<th>Time of HSPICE (s)</th>
<th>Speedup to HSPICE</th>
<th>AER to HSPICE</th>
<th>PER to HSPICE</th>
<th>Time of MSPICE (s)</th>
<th>Speedup to MSPICE</th>
</tr>
</thead>
<tbody>
<tr>
<td>Ckt1</td>
<td>5678</td>
<td>1.8</td>
<td>63.0</td>
<td>35.0</td>
<td>0.02%</td>
<td>0.2%</td>
<td>11.6</td>
<td>6.4</td>
</tr>
<tr>
<td>Ckt2</td>
<td>11479</td>
<td>4.1</td>
<td>268.4</td>
<td>20.0</td>
<td>0.04%</td>
<td>0.4%</td>
<td>52.8</td>
<td>12.9</td>
</tr>
<tr>
<td>Ckt3</td>
<td>23011</td>
<td>8.3</td>
<td>622.3</td>
<td>75.0</td>
<td>0.03%</td>
<td>0.3%</td>
<td>70.0</td>
<td>8.4</td>
</tr>
<tr>
<td>Ckt4</td>
<td>46090</td>
<td>17.1</td>
<td>1636.5</td>
<td>95.7</td>
<td>0.03%</td>
<td>0.3%</td>
<td>152.6</td>
<td>8.9</td>
</tr>
<tr>
<td>Ckt5</td>
<td>92155</td>
<td>39.5</td>
<td>11126.5</td>
<td>281.7</td>
<td>0.09%</td>
<td>0.3%</td>
<td>428.7</td>
<td>10.8</td>
</tr>
<tr>
<td>Ckt6</td>
<td>369983</td>
<td>196.7</td>
<td>N.A.</td>
<td>N.A.</td>
<td>N.A.</td>
<td>N.A.</td>
<td>3798.1</td>
<td>19.3</td>
</tr>
<tr>
<td>Ckt7</td>
<td>1156220</td>
<td>815.5</td>
<td>N.A.</td>
<td>N.A.</td>
<td>N.A.</td>
<td>N.A.</td>
<td>N.A.</td>
<td>N.A.</td>
</tr>
</tbody>
</table>

The proposed method is able to handle the power network with millions of nodes, as shown in Table I. In contrast, HSPICE cannot afford the case with more than 100,000 nodes. MSPICE is not valid for the case with one million nodes, either. With the CPU time in Table I, we can also validate the computational complexity of the employed CGS equation solver, which shows that it is of $O(N^{1.14})$ for these test cases.

On the machine with four cores, we carried out the experiment of parallel computation. For the seven cases, the computational times of the parallel program and the corresponding serial program are listed in Table II. The speedup ratio is about 2.5 in this experiment. Because it is hard to fully balance the workload and there are other overheads, an ideal 4X speedup with four CPUs is not achieved. Nevertheless, with the parallel computation, the speedup of the proposed method to HSPICE or MSPICE becomes even larger. Moreover, due to the independence of solving (5) for different frequency points, more speedup would be achieved with more CPUs.

### C. Analyze the Worst-Case Voltage Variation for Circuits With Multiple Clock Domains

We analyze two power network cases with multidomain clock gating. The first one is the “Ckt1” in Table I, where four clock domains are included. The current sources are uniformly distributed in each clock domain and are synchronized with each other. The clock frequency is 200 MHz, and the voltage response for one-cycle current sources takes six cycles to reach the steady state. The four curves in Fig. 12 show the voltage responses with only one clock domain working for one cycle. The node which we are concerning is at the center of domain 1.

In Table III, we present the worst cases of voltage variation considering each clock domain, respectively, and considering all clock domains. The second column gives the peak voltage in the response waveform caused by current sources working for only one cycle. The third column shows the worst case of voltage variation caused by a sequence of clock signals. The
last column includes the corresponding clock-gating patterns. For example, the voltage response caused by one-cycle current sources in domain 1 has a minimum value of \(-10.5\) mV. Then, if the clock pattern for domain 1 is \{1, 1, 1, 0, 1, 1\}, the resulting maximum voltage drop would be 11.8 mV. This means that the variation will be worse if the clock-gating technique is used. The last row in Table III shows the worst case for the actual circuit with four clock domains. The maximum voltage drop will be 15.7 mV, worse than any result considering the single clock domain.

The second case is a large-scale industrial case. This case includes about \(3 \times 10^5\) nodes and \(10^4\) current sources. The circuit is divided into four clock domains, and the clock frequency is 2 GHz. For the proposed simulation method, \(f_{\text{max}}\) is set to 10 GHz. The simulation results show that the voltage response for one-cycle stimulus takes 30 cycles to reach the steady state. Fig. 13 shows the voltage responses with only one clock domain working for one cycle. The maximum voltage drops caused by each domain are 0.15, 1.9, 0.41, and 23.4 mV, respectively. If considering all the four domains together, the worst-case voltage drop is 45.5 mV. This result suggests again that the power network voltage variation will be much larger in circuit with multidomain clock gating.

VI. CONCLUSION

An efficient framework has been proposed for the worst-case analysis of the power network in circuits with multidomain clock gating, which includes two main contributions:

1) A frequency-domain-based transient simulation method is proposed. With the application of the VF technique and iterative equation solver from PETSc library, the frequency-domain-based method is much more efficient than the conventional time-domain simulation method while preserving good accuracy.

2) An algorithm predicting the worst-case clock-gating pattern and the corresponding maximum voltage variation of power network is proposed. The algorithm superimposes the voltage responses caused by a single domain working for one clock cycle and has linear computational complexity.

Numerical results have shown that the proposed simulation method is up to several hundred times faster than commercial simulators like HSPICE and MSPICE. Moreover, the analysis flow is able to handle larger industrial cases with one million nodes. Preliminary results of parallel computation demonstrate larger speedup to the conventional simulation methods and the ease of parallelizing the proposed method. The worst-case analysis results suggest that the power network voltage variation would be much larger if considering the clock-gating signals of multiple domains.

REFERENCES


Wanping Zhang (S’06) received the B.S. degree in computer science from the University of Science and Technology of China, Hefei, China, in 2005, and the M.S. degree in computer science from the University of California, San Diego (UCSD), La Jolla, in 2007. He is currently working toward the Ph.D. degree in the Department of Computer Science and Engineering, UCSD. He is currently a Software Engineer with Qualcomm Inc., San Diego, CA. His research interests are on very large scale integration CAD, including power network analysis and optimization.

Wenjian Yu (S’01–M’04) received the B.S. and Ph.D. degrees in computer science from Tsinghua University, Beijing, China, in 1999 and 2003, respectively. Since 2003, he has been with Tsinghua University, where he is currently an Associate Professor with the Department of Computer Science and Technology. He has visited the Department of Computer Science and Engineering, University of California, San Diego, La Jolla, for several times during the period from September 2005 to January 2008. He is currently working toward the Ph.D. degree in the Department of Computer Science and Engineering, UCSD. His research interests include parasitic extraction, modeling and simulation of interconnects, and a broad range of numerical methods.

Dr. Yu was a Technical Program Committee member of the ACM/IEEE Asia South-Pacific Design Automation Conference in 2005, 2007, and 2008, and the International Workshop on System Level Interconnect Prediction in 2009. He was the recipient of the distinguished Ph.D. Award from Tsinghua University in 2003 and has served as a reviewer for the IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS and the IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES.

Xiang Hu (S’08) received the B.E. and M.S. degrees in electrical engineering from Tsinghua University, Beijing, China, in 2005 and 2007, respectively. He is currently working toward the Ph.D. degree in electrical and computer engineering at the University of California, San Diego, La Jolla.

His research interests are in the areas of analysis and optimization of power distribution networks, and circuit simulation.

Ling Zhang received the B.S. and M.S. degrees in electronic engineering and computer engineering from Tsinghua University, Beijing, China, in 2002 and 2004, respectively, and the Ph.D. degree from the Department of Computer Science and Engineering, University of California, San Diego, La Jolla, in 2009.

Her research interests include on-chip and off-chip high-performance interconnect analysis and optimization, low-skew clock network distribution, on-chip global routing techniques, and innovative logic structure.

Rui Shi received the B.S. degree in electrical engineering and the M.S. degree in computer science from Tsinghua University, Beijing, China, in 1999 and 2002, respectively, and the Ph.D. degree in computer engineering from the University of California, San Diego (UCSD), La Jolla, in 2008, where her research focused on off-chip wire distribution and signal analysis.

After college, she joined Synopsys, Inc., as an Engineer and is working on signal integrity.

He Peng (S’03) received the B.S. degree in electrical engineering from the University of Electronic Science and Technology of China, Chengdu, China, in 2001, and the M.S. degree in computer science from the University of California, San Diego, La Jolla, in 2005, where he is currently working toward the Ph.D. degree. His research interests include transistor-level circuit simulation, parallel circuit simulation, and model order reduction.

Zhi Zhu received the B.S. degree in microelectronic engineering from Tsinghua University, Beijing, China, in 2002, and the M.S. degree in electrical engineering from the University of Massachusetts, Amherst, in 2004.

Since 2004, he has been with Qualcomm CDMA Technology, San Diego, CA, where he is primarily working on power distribution network analysis, low-power design methodology, and high-speed serial-interface circuit design.
Lew Chua-Euan (M’04) received the B.S. degree in electrical engineering from Cornell University, Ithaca, NY, in 1986, and the M.S. degree in engineering science from Harvard University, Cambridge, MA, in 1990.

In 1986, he developed a firmware at Honeywell-Bull computer division, and in 1988, he worked and managed design teams at Motorola’s SPS sector and technically contributed to the design of Motorola’s first sigma-delta modulator products and low-power DSPs. In 1992, he was with Motorola’s RISC division, working on the design and development of PowerPC chips at the Somerset Design Center. He has since managed and technically led design engineering teams at Silicon Graphics Inc., Hitachi, CISCO, and MIPS Technologies. He is currently a Principal Engineer/Manager with Qualcomm Inc., San Diego, CA. Most recently, he ran the advance design and low-power integration team within Qualcomm’s QCT division. The main focus of which is the development of low-power techniques on SoC/SIP products and the development of robust sign-off methodologies. In addition, the team is also responsible and oversees the power distribution strategy on all on QCT silicon platforms. He is the holder of 15 patents in the design of microprocessors and memory subsystems, sigma-delta converters, low-power circuit techniques, power distribution, and signal integrity. He has coauthored and presented papers on design methodology, the implementation of very large scale integration circuits and subsystems, and sigma-delta applications. His areas of interests are in implementation of low-power multimedia and communication applications and the new enabling technologies for such implementations.

Rajeev Murgai (M’89) received the B.Tech. degree (with highest honors) in electrical engineering from the Indian Institute of Technology, Delhi, India, in 1987, the M.S. degree in electrical and computer engineering from Carnegie Mellon University, Pittsburgh, PA, in 1989, and the Ph.D. degree in electrical engineering and computer sciences from the University of California, Berkeley, in 1993.

He joined Fujitsu Laboratories Ltd., Kawasaki, Japan, in 1994, as a Member of Research Staff and became a Research Fellow in 2004. In 2008, he joined Magma Design Automation, Inc., San Jose, CA, where he is currently the Vice President of Product Development, Design Implementation Business Unit. His research interests include logic/physical synthesis, field-programmable gate arrays, and clock network synthesis.

Dr. Murgai has served on the technical program committees of several leading conferences such as Design Automation Conference, International Conference on Computer-Aided Design, Design, Automation and Test in Europe, Very Large Scale Integrated Circuits Design, and International Symposium on Quality Electronic Design.

Toshiyuki Shibuya (M’08) received the B.E. degree in electrical engineering from Waseda University, Tokyo, Japan, in 1985.

He joined Fujitsu Laboratories Ltd., Kawasaki, Japan, in 1985, and led the research and development of very large scale integration layout algorithms, design for manufacturability, statistical analysis, and parallel computing. He was a Visiting Scholar with the University of California, Los Angeles, from 1995 to 1996. He is currently a Research Fellow with Fujitsu Laboratories of America, Inc., Sunnyvale, CA.

Noriyuki Ito received the B.S. degree in information science from Kyoto University, Kyoto, Japan, in 1982.

He joined Fujitsu Laboratories Ltd., Kawasaki, Japan, in 1982, and subsequently worked on CAD for processor design. He is currently a Director in the Technology Division of Server Systems Group in Fujitsu. He is responsible for the development and delivery of design methodologies and tools of CPUs and chip sets for servers.

Chung-Kuan Cheng (S’82–M’84–SM’95–F’00) received the B.S. and M.S. degrees in electrical engineering from the National Taiwan University, Taipei, Taiwan, and the Ph.D. degree in electrical engineering and computer science from the University of California, Berkeley, in 1984.

From 1984 to 1986, he was a Senior CAD Engineer with the Advanced Micro Devices Inc. In 1986, he joined the University of California, San Diego (UCSD), La Jolla, where he is currently a Professor with the Department of Computer Science and Engineering and an Adjunct Professor with the Department of Electrical and Computer Engineering. He served as a Chief Scientist with Mentor Graphics in 1999. He was appointed as an Honorary Guest Professor of Tsinghua University during 2002–2008. His research interests include medical modeling and analysis, network optimization, and design automation on microelectronic circuits.

Dr. Cheng was an Associate Editor of the IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN from 1994 to 2003. He was a recipient of the best paper awards in the IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN in 1997 and in 2002; the NCR Excellence in Teaching Award in the School of Engineering, UCSD, in 1991; and the IBM Faculty Awards in 2004, 2006, and 2007.