# A 90-nm Radiation Hardened Clock Spine

Srivatsan Chellappa, Lawrence T. Clark, Senior Member, IEEE, Keith E. Holbert, Senior Member, IEEE

*Abstract*—A RHBD clock distribution network is described that reliably synchronizes the flow of signals through an integrated circuit in the presence of SETs. The clock spine design controls both redundant and non-redundant hardened circuits. The design uses techniques to reduce the jitter due to SETs, as well as error detection at every clock edge, since errors may be in the clock gating enables rather than the clocks themselves. The clock spine has been fabricated and tested on both standard and a low power 90-nm test chips, and proven hard as demonstrated by both heavy ion and proton broad beam testing.

*Index Terms*—Radiation hardening by design, clock generation, single event upset, single event transients.

## I. INTRODUCTION

WITH progressively scaled fabrication technologies, integrated circuit (IC) dimensions and power supply voltages decrease. As a consequence of this scaling, the drive currents provided by reduced width transistors have diminished, while the ion deposited charge remains constant. This has made circuits increasingly vulnerable to single event effects (SEEs), i.e., soft errors [1], [2]. Such radiation induced failures are particularly important in mission critical applications in biomedical, space and military operations. Since all circuits are controlled by the clock network, its hardening is critically important. This paper details the essential clock distribution functions in an IC, their soft-error vulnerabilities, and how they are mitigated with our proposed radiation hardening by design (RHBD) approaches.

## A. Contribution of This Work

Though single event effects on CMOS combinational and sequential circuits have been studied and analyzed in great detail [3]-[5], their effects on clock networks have not been widely addressed, and then only through simulation. In this paper, we describe a radiation hardened clock spine design and its experimental verification. The design is not only tolerant to most SEEs, but also detects single event upsets (SEUs) and single event transients (SETs) causing incorrect clock assertions. This is useful for characterization, but more importantly allows error recovery or mitigation in circuits where logic upsets cause clocks to be asserted, or failure to assert, such as in memory circuits. The clock spine was fabricated on two 90 nm test chips and tested for circuit hardness in the presence of protons and heavy ions with LET



Fig. 1. (a) Tree based clock; (b) clock grid; (c) clock grid with clock gating. Clock gating can be added to the tree as well, and may be implemented at multiple levels.

exceeding 100 MeV-cm<sup>2</sup>/mg.

# B. Paper Organization

This paper is organized as follows. Section II provides the background on basic clock generation and distribution, and radiation effects in clocks. Section III describes our proposed approaches to RHBD clock distribution and it design issues. Section IV describes the test chip and provides experimental results from broad beam heavy ion testing and Section V concludes the paper.

#### II. BACKGROUND AND PRIOR WORK

#### A. VLSI Clocking

Clock generation and distribution are essential to CMOS microchips, providing synchronization to external devices and between internal sequential logic. As all sequential logic is designed to change state in temporal reference to the clock, minimizing the clock delay and clock skew between different points in the chip is the goal of a robust clock distribution network design.

A phase locked loop (PLL) usually generates the clock at the desired frequency that is then distributed to the whole chip through the clock distribution network. Additionally, the PLL usually matches the phase (phase locks) of the internally generated clock, which is usually a multiple of the external clock, to that of the external clock.

The clock source must then be distributed to the many thousands of sequential circuits (sinks) on the chip, ideally all receiving the clock at the same time. Clock timing skew is

Manuscript submitted Sept. 13, 2011. This work was supported by Air Force Research Laboratory, Space Vehicles Directorate, Kirtland AFB, NM.

S. Chellappa, L. T. Clark and K. E. Holbert are with the School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ 85287-5706 USA (e-mail: Srivatsan.Chellappa@asu.edu, Lawrence.Clark@asu.edu, Keith.Holbert@asu.edu).

both systematic and random, but the former can be mitigated through good design. The most common clock distribution strategy is a tree (see Fig. 1(a)). The clock paths are buffered, starting from the source H-tree configuration using delay matched paths. The buffer drive strengths and capacitance of each node is progressively decreased as the signal propagates to the lower levels of hierarchy in the tree. In the mesh configuration, shown in Fig. 1(b), the buffered clock tree signals are shunted to create a clock grid or mesh structure. This grid structure has the advantage that since the branch resistances are in parallel, the effective clock skew is minimized—differences from driver circuits, either as designed in or due to such effects as power supply noise are averaged out.

Finally, in many designs, the entire distribution network is compressed into a "clock spine" or multiple spines. A key advantage of a spine is that by not spatially distributing it, the mesh and/or tree structures have no interference from other circuits, and it is easy to design in parallel with other blocks. A disadvantage is that the final clock outputs must travel further to the clock sinks, i.e., the driven sequential elements, while maintaining balanced skew.

Power dissipation increasingly limits modern microchips. Since gating the clocks eliminates active power dissipation in the sequential and intervening combinational logic, it is the most effective low power technique. Gated clocks must follow the tree topology, since individual branches are gated independently, as evident in Fig. 1(c). Thus, even mesh networks generally end in a tree as shown.

#### B. Soft Errors in CMOS

High energy particles create charge carriers as they travel through the IC. The deposited charge is collected by source and drain diodes of the CMOS transistors. Since digital CMOS ICs carry information in two discrete voltage levels V<sub>DD</sub> and V<sub>SS</sub>, the collected charge may upset the circuit logic state. In purely combinational circuits this effect is temporary since the combinational logic restores the original state in some finite time, i.e., a SET. This erroneous state may be latched in a storage element and propagated to other levels causing an architecturally visible error. Otherwise, such SETs are benign. SET amplitude is primarily a function of the affected nodes' capacitance. SETs can be prevented by a capacitance sufficient to absorb the charge without producing a voltage amplitude sufficient to switch a receiving gate. Once the SET exceeds this threshold, gates driven by the affected node will respond if the duration is sufficiently long. The duration is primarily a function of the strength (transistor width) of the driving transistors [6].

The amount of charge deposited by a radiation particle per unit time (Q) is a function of its linear energy transfer (LET) and its charge collection depth. The clock network itself is combinational logic and thus only prone to SETs. However, clock gating is commonly used, and the gating uses latches to ensure hold times, as well as being affected by the controlling circuit state. Thus, clocks may also be affected by SEUs [7]. SEEs affect the clock network primarily by producing three particular modes of circuit failure:

*Radiation Induced Clock Jitter*—when the radiation particle induced charge is collected close to the clock edge, the clock edge may deviate from its expected transition time causing increased clock jitter. Basically, the driving transistors' current to the clock node is enhanced or diminished by the amount of collected charge. The net effect is a timing push out or pull in of the affected edge. This directly affects the clock jitter, requiring larger setup and hold times to accommodate the worst-case impact.

*Clock Glitching*—the collected charge causes the clock to transition to the wrong state, introducing a new clock edge. In edge sensitive circuit paths, this can lead to the wrong data being sampled. For flip-flops, this can induce what are essentially setup or hold time errors in the subsequent logic. This is referred to as radiation induced race in [8], [9]. Edge triggered circuits are not limited to flip-flops. Memory circuits pre-charge dynamic bit lines in one clock phase, reading or writing in the other clock phase. At high clock frequencies, these high capacitance nodes, e.g., the bit lines, require the entire clock phase for correct operation. Clock glitches interrupt these operations and may produce incorrect results that may, in turn, not be captured by error detection and correction schemes [10].

Since local clock nodes generally feed more than one sequential element, a radiation strike on one such node may cause multiple errors to be propagated. It has been shown that even hardened flip-flops, e.g., those using DICE storage [11], may be prone to errors induced on high fan-out nodes such as clocks and resets [12], [13]. Designs that are not prone to such upsets are larger and dissipate more power [14], [15].

Incorrect Clock Assertions in Gated Clocks-all modern commercial designs save power by using clock gating, which is, in general, more effective at finer granularities. Basically, clocks to circuits that are unused in that clock cycle are gated off, which eliminates both sequential circuit (flip-flop and latch) as well as combinational logic power, as shown in Fig. 1(c). It is possible to "grid" clocks at all levels, which reduces skew, and more importantly in a radiation tolerant design, increases clock node capacitance and thus SET resilience. However, fine-grained clock gating results in relatively low drive, low capacitance clock nodes, which are more susceptible to SETs. Moreover, clock gating requires latching the clock enables through the active clock phase. These latches are susceptible to SEU, and the logic path controlling the gating to SET. Depending on the sequential circuit approach used, hardened designs may not use this due to issues with clock SETs [16].

It has been shown that for commercial circuit architectures using master-slave flip-flops, SEEs affecting clock nodes contribute to about 20% of the total chip errors while the number rises to about 90% for pulse latch based designs [8]. Thus robust radiation hardened clock networks are crucial.

The clock network has considerable overall capacitance, and a primary function of the network is to increase the overall



Fig. 2. The clock distribution network. Note that the signals E5Gclk, E4Gclk, E3Gclk, E2Gclk and E1Gclk are large nodes that are shorted throughout the chip and hence have large node capacitances; 23 wires are used to equally distribute the E5Gclk signal throughout the clock spine to reduce node delay.

drive through many stages, so that the large overall capacitance presented by the sequential circuits can be driven with good edge rates. Consequently, the hardening approach must differ throughout the clock network. At early nodes with relatively low capacitance, limiting the impact of collected charge is important. In contrast, the final global clock nodes have very large drive and capacitance, making them essentially immune by their size. At the end of the network, where individual clocks are gated, errors must be detected, or mitigated by the receiving circuits themselves.

# III. PROPOSED DESIGN OF THE RAD-HARD CLOCK DISTRIBUTION NETWORK

The RHBD clock spine network described here is divided into two sections: the global clock distribution network, which carries the clock in an H-tree configuration throughout the spine itself, and the large number of local clocks that synchronize the IC logic. The latter can be further gated at the circuit level, in triple mode or dual mode redundant (TMR or DMR) logic circuits in the test die. The circuit approach to designing the global clock network is based on fault-tolerance. The global clock nodes have large capacitances and are driven by a large number of spatially distributed drivers. An SET on these global nodes cannot be detected but due to the large capacitance, their impact is negligible. The approach to designing the local clock drivers emphasizes fault detection, where an SET is detected on the clock or the enable signals and a corresponding error signal is produced to indicate a faulty clock to the subsequent non-redundant circuits, which can be dealt with locally.

## A. Global Clock Distribution

Referring to Fig. 2, the clock signal from the PLL is controlled by 102 spatially dispersed drivers, over 10  $\mu$ m apart



Fig. 3. Radiation induced jitter on the EGclk (a global clock node). A simulated SET with charge equivalent to LET of 30 MeV-cm<sup>2</sup>/mg produces clock jitter of less than 1 ps.



Fig. 4. An SET strike at the input of one of 102 inverters in the PLL-to-E5Gclk buffer produces a jitter of 0.63 ps at the input of the clock spine. The straight line shows E5Gclk node without the SET while the dashed line shows the E5Gclk affected due to the impact of the SET. There is 10  $\mu m$  separation between inverters.

that drive a common node E5Gclk, which is the 5<sup>th</sup> stage early global clock. Any irradiation at this early stage of the clock spine will affect all clocks. The multiple drivers assure that any SET at one of these stages can cause a maximum jitter of less than 1% of the drive of the E5Gclk node. This is demonstrated in Fig. 3, where an SET at the EGclk global clock node produces negligible phase noise (less than expected from power supply noise). Fig. 4 shows the effect of an SET on one of the driver nodes that buffer the clock, resulting in less than 1 ps of jitter, which is considerably smaller than that due to the PLL. Thus, high capacitance global clocks have sufficient drive and capacitance to be essentially SET immune. Hardened design must then focus on the lower capacitance root and the gated leaves.

In the test chips, the clock begins at an unhardened PLL (foundry supplied) that is shielded during testing. The primary spine input, the E5Gclk, is buffered up by spatially distributed inverters driven in the clock spine by 23 wires that distribute and buffer the PLL clock output to reduce clock skew due to wire RC in different sections of the spine. The E5Gclk is buffered to the local clock networks using 5 inversion stages, each connecting to a global clock node (E4Gclk, E3Gclk, E2Gclk, E1Gclk and EGclk). Each of these global nodes is driven from the previous stage by 38 spatially dispersed inverters that also provide sufficient drive fan-up to supply the

large number of local clock networks. The interconnection of these stages is a mesh, which as mentioned, limits skew, while driving increasing capacitance in these un-gated levels. This large capacitance and drive makes them essentially immune to SET induced glitches.

## B. Local Clock Driver Design

The local clock networks produce the local clocks for each of the IC logic sub blocks. The local clocks can each be enabled or disabled as needed to conserve power by two enables Enclk and Engclk, the local and global enables respectively, as shown in Fig. 5(a). The transparent latches are required to hold the enables through the clock high assertion. The latches also represent the primary SEU cross-section in the clock spine. Additionally due to their smaller drive and capacitance, the enable signals are vulnerable to SETs. When driving TMR logic, a clock can be asserted or fail to assert without causing a logic error since the TMR logic is selfcorrecting. When driving non-redundant circuits, e.g., memories, the local clocks and drivers have checking circuits that detect when an inadvertent clock edge occurs, whether due to an SET or SEU in the controlling latches. These produce a Clock hit signal to indicate such a clock error event. The checker compares the local clock signal by XORing it with a redundant copy of its corresponding enable (from one of the redundant control logic copies) to determine if the local clock was correctly asserted. The design of the checker circuit is shown in Fig. 5(b). The memory circuits respond to an inadvertent clock or clock assertion failure as



Fig. 5. The local clock network that produces the individual clock signals from the EGclk(a). Schematic of the local clock checker circuit (b).

described in [10], which also describes local checking in the cache circuits for phase and other timing errors.

The local clock networks and their corresponding checkers are spatially separated to prevent multiple node charge collection from upsetting two critical nodes in the same circuit at the same time. An SET on the copy of the enable signal would trigger a false Clock\_hit signal when the original enable would not have been affected. Such false errors trigger an error response in the affected memory circuits, but this cross-



Fig. 6. Simulation waveforms of SET hits at different parts of the local clock network shown on different clock cycles. Note that false clock hit signals are produced when the copy of the enable signal is hit. Since the enable and its copy are spatially separated, the probability of an SET affecting both nodes simultaneously is very small. Other SETs that affect at the local clock edge are correctly detected by the checker.



Fig. 7. The layout of the complete clock spine with the PLL to E5Gclk buffers. The global clock nodes (E5Gclk, E4Gclk, E3Gclk, E2Gclk and E1Gclk) are laid gridded to minimize skew.

#### section is relatively small.

The possible radiation induced errors in the local clock network and the checker circuits are simulated in Fig. 6. Any SET on the enable or the local clock path at the falling edge of the clock is detected by the checker. The models of the SETs used to carry out the SPICE simulation are implemented in Verilog-A following the approaches in [17], [18].

#### C. Clock Spine Physical Design

The test chip clock is generated by an unhardened, foundry supplied PLL. The PLL controls are TMR and voted at the periphery. The PLL generated single root clock is buffered up by the distributed buffer using 102 buffers. The buffer outputs are shunted to create the E5Gclk node that is then connected to the center of the spine using 23 wires spaced about 78  $\mu$ m apart as shown in Fig. 7. The buffers are separated by 10  $\mu$ m to minimize the likelihood of any two internal nodes being affected by an SET at the same time. The global nodes E5Gclk, E4Gclk, E3Gclk, E2Gclk, E1Gclk and EGclk have 2.76 pF, 2.98 pF, 3.83 pF, 5.92 pF, 7.6 pF and 12.69 pF of total capacitance, respectively. These nodes are laid out as a grid inside the spine as shown in the right side of Fig. 7.

The 152 primary outputs of the clock spine, signals Clk<151:0> and Clkx<151:0> are generated by the local clock distribution network that controls the clocks with the individual enables. The Clkx copy clocks are used in the checkers to validate the original copy of the clock. Their enables are generated by redundant logic. The local clock nodes (Gclks, local Clks) have relatively smaller capacitances than the global nodes (E5Gclk, E4Gclk, E3Gclk, E2Gclk, E1Gclk and EGclk). The single stage Gclk node has about 34.17 fF while the final clock out from the spine has about 48.11 fF. These smaller capacitances necessitate the clock error checking circuitry discussed earlier. Systematic skew at the Gclks is less than 1 ps.

All NMOS transistors in the clock spine use annular layout to mitigate total ionizing dose induced standby leakage power



Fig. 8. Die microphotograph with the clock spine layout overlaid.



Fig. 9. Clock spine driving different kinds of logic circuitry in the test chip.

increases. P type guard rings are also used to isolate diffusions and the N wells for single event latchup mitigation. Since most of the devices are large, and the spine has considerable white space to allow driver separation, there is no added area cost. Empty space is filled with decoupling capacitance, which reduces the power supply noise and its impact on jitter.

#### IV. EXPERIMENTAL VERIFICATION

## A. Test Chip Design

The RHBD clock spine was designed and fabricated on both the IBM trusted foundry standard and low standby power 90 nm processes. All the logic circuits including the I/O are implemented as TMR circuits that self-correct by voting except the standard foundry provided non-rad-hard PLL used to generate the clock. The PLL was shielded from the radiation particles during testing. The spine was clocked to a maximum of 1 GHz with  $V_{DD} = 1.2$  V. The die microphotograph and the test die floor plan are shown in Fig. 8.

The test chips contain three types of logic circuits (see Fig. 9). First, the test engine uses fine grained TMR logic [19]. Each of the TMR logic copies is clocked by a separate clock, so if one clock incorrectly asserts, or fails to assert, the logic operation is unaffected. Secondly, DMR logic is protected by local error detection, which allows operations to restart. In this logic, if a clock is incorrectly asserted and affects the



Fig. 10. (a) The heavy ion beam test board at the Texas A&M University cyclotron. (b) Proton test setup at the Lawrence Berkeley Labs cyclotron.

operation result, the resulting state is cleaned up and the operation restarted [20]. Third, most clocks drive cache memory banks. The aforementioned clock checking determines if clocks are inadvertently asserted or fail to assert, which would impact the cache operation. In the cache, there is also extensive error checking [10] that protects from short clocks and glitched clocks, as well as other periphery SET errors.

#### B. Test Setup

The device under test (DUT) is mounted on a daughter card that is controlled by an FPGA board, which manages the testing. A divided clock is output to provide test visibility to the internal clock and to ensure adequate jitter performance. During each beam test, the programmable test engine runs different tests on the chip while all logic state, including the clock checking circuits, is monitored.

#### C. Heavy Ion Testing

The test die was exposed to heavy ion beams at the cyclotron at Texas A&M University in air at room temperature using N, Ne, Cu, Ar, Kr and Au ions. Fluences from  $5 \times 10^5$  to  $2 \times 10^7$  particles/cm<sup>2</sup> were used. Beam angles from 0° to 79° (0° being the normal incidence) were used. The beam was incident on the die front (metallization) side. The effective LET ranged from 1.4 to 219.8 MeV-cm<sup>2</sup>/mg. For high speed I/O, the daughter board and the FPGA were mounted one below the other as shown in Fig. 10(a).

# D. Proton Testing

The clock spine was tested using 13.5 MeV/u and 49.3 MeV/u proton beams at the 88-inch cyclotron at Lawrence Berkley National Laboratory. The FPGA test board was separated from the beam using 3-foot cables (see Fig. 10(b). The cables and inability to shield the PLL from the protons limited the clock frequency, primarily by requiring PLL bypass, to 100 MHz.

#### E. Experimental Results

While over 50,000 errors were detected in the memories, only one clock spine error was reported by the DUT during testing. This occurred using Au ions, with an angle of  $53^{\circ}$ , i.e. at an LET<sub>eff</sub> = 152 MeV-cm<sup>2</sup>/mg. In the bypass mode, no clock spine errors were found in the proton tests, although errors due to clock SETs at the clock root (prior to E5Gclk) were detected when running with the (unhardened) PLL on, leading us to conclude that the PLL was susceptible to upsets.

## V. CONCLUSION

This paper describes a RHBD clock spine and distribution network design with circuits for detecting radiation induced errors. The proposed clock network was fabricated and tested using two 90 nm process variants. The design was exposed to heavy ions and protons and was experimentally found to be hard to over 100 MeV-cm<sup>2</sup>/mg. The test results demonstrate the clock spine radiation hardening techniques are effective.

#### VI. ACKNOWLEDGMENT

The authors thank Dr. X. Yao for contributions to this work.

#### REFERENCES

- N. Seifert, X. Zhu, and L. W. Massengill, "Impact of scaling on softerror rates in commercial microprocessors," *IEEE Trans. Nucl. Sci.*, vol. 49, pp. 3100-3106, Dec. 2002.
- [2] R. Baumann, "The impact of technology scaling on soft error rate performance and limits to the efficacy of error correction," *Intl Digest* on Electron Devices Meeting, Dec. 2002, pp 329-332.
- [3] W. Beauvais, P. McNulty, W. A. Kader, and R. Reed, "SEU parameters and proton-induced upsets," in *Proc. Sec. European Conf. on Radiation* and its Effects on Components and Systems, pp. 540-545, Sept. 1993.
- [4] W. Massengill, M. Alles, and S. Kerns, "SEU error rates in advanced digital CMOS," in *Proc. Sec. European Conf. on Radiation and its Effects on Components and Systems*, pp. 546-553, Sept. 1993.
- [5] N. Seifert, et al., "Radiation-induced soft error rates of advanced CMOS bulk devices," Proc. Int. Phys. Reliab. Symp, March 2006, pp. 217-225.
- [6] D. Kobayashi, T. Makino, and K. Hirose, "Analytical expression for temporal width characterization of radiation-induced pulse noises in SOI CMOS logic gates," *Proc. IRPS*, pp. 165-169, 2009.
- [7] K. Hass and J. Gambles, "Single event transients in deep submicron CMOS," in *Proc. IEEE 42nd Midwest Symp. on Circuits and Systems*, vol. 1, Aug. 1999, pp. 122-125.
- [8] N. Seifert, et al., "Radiation induced clock jitter and race," Proc. Int. Phys. Rel. Symp., April 2005, pp. 215-222.
- [9] R. Dash, R. Garg, S. P. Khatri and G. Choi, "SEU hardened clock regeneration circuits," *Intl Symp on Quality of Elec. Design*, 2009, pp. 806-813.
- [10] X. Yao, L. Clark; D. Patterson, K. Holbert, "A 90 nm bulk CMOS radiation hardened by design cache memory," *IEEE Trans. Nucl. Sci*, vol. 57, no. 4, pp. 2089-2097, Aug. 2010.
- [11] T. Calin, M. Nicolaidis and R.Velazco, "Upset hardened memory design for submicron CMOS technology," *IEEE Trans. Nucl. Sci.*, vol. 43, no. 6, pp. 2874-2878, Dec. 1996.
- [12] K. Warren, et al., "Heavy ion testing and single event upset rate prediction considerations for a DICE flip-flop," *IEEE Trans. Nucl. Sci.*, vol. 56, no. 6, pp. 3130-3137, Dec. 2009.
- [13] J. Knudsen and L. Clark, "An area and power efficient radiation hardened by design flip-flop," *IEEE Trans. Nucl. Sci.*, vol. 53, no. 6, pp. 3392-3399, Dec. 2006.
- [14] D. Mavis and P. Eaton, "Soft error mitigation techniques for modern microcircuits," *Proc. IEEE IRPS*, Aug. 2002 pp. 216-225, 2002.
- [15] B. Matush, T. Mozdzen, L. Clark and J. Knudsen, "Area efficient temporally hardened by design flip flop circuits," *IEEE Trans. Nucl. Sci.*, vol. 57, no. 6, pp. 3588-3595, Dec. 2010.
- [16] K. Mohr and L. Clark, "Experimental characterization and application of circuit architecture level single event transient mitigation," *Proc. IRPS*, pp. 312-317, 2007.
- [17] H. Cha and J. H. Patel, "A logic-level model for particle hits in CMOS circuits," *Intl Conf. on Computer Design*, pp. 538-542, Oct. 1993.
- [18] D. Fulkerson, D. Nelson, R. Carlson, and E. Vogt, "Modeling ioninduced pulses in radiation-hard SOI integrated circuits," *IEEE Trans. Nucl. Sci*, vol. 54, no. 4, pp. 1406-1415, Aug. 2007.
- [19] N. Hindman, D. Pettit, D. Patterson, K. Nielsen, X. Yao, K. Holbert and L. Clark, "High speed redundant self correcting circuits for radiation hardened by design logic," *Proc. RADECS*, 2009, pp. 465-472.
- [20] L. Clark, D. Patterson, N. Hindman, K. Holbert, and S. Guertin, "A dual mode redundant approach for microprocessor soft error hardness," to appear in *IEEE Trans. Nucl. Sci.*, Dec. 2011.