Fully redundant clock generation and distribution with dynamic oscillator switchover
ABSTRACT This paper describes a fully redundant clock generation and distribution approach with a fully dynamic switchover capability and concurrent repair. It highlights the challenges as the design evolved from a single source, to a “cold” standby backup, and finally to a fully redundant transparent switchover with no interruption of the workloads running on an IBM System z9™. The function split between hardware and the various levels of firmware is described, including the methods to determine the defect component in the clock distribution paths. Finally, we describe the joint effort with a major chip technology vendor to design and develop the necessary circuitry, according to the z9™ requirements, for clock synchronization and switching.
- SourceAvailable from: psu.edu[show abstract] [hide abstract]
ABSTRACT: The IBM eServer™ zSeries® Model z990 offers customers significant new opportunity for server growth while preserving and enhancing server availability. The z990 provides vertical growth capability by introducing the concurrent addition of processor/memory books and horizontal growth in channels by the use of extended virtualization technology. In order to continue to support the zSeries legacy for high availability and continuous reliable operation, the z990 delivers significant new features for reliability, availability, and serviceability (RAS). This paper describes these new capabilities, in each case presenting the value of the feature, both in terms of enhancing the self-management capability of the server and its availability.Ibm Journal of Research and Development 06/2004; · 0.69 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: As computer systems become more complex, the use of embedded controllers for initializing and maintaining system operation is becoming increasingly prevalent. In the IBM eServer z900, a new control approach was introduced. This paper discusses why its introduction was necessary and outlines its associated, key technological and economic innovations. In particular, the following topics are addressed: service subsystem topology, hardware elements for performing system control, hardware abstraction, object-oriented framework for control, and inter-networking of system control microprocessors.Ibm Journal of Research and Development 08/2002; · 0.69 Impact Factor
Article: RAS strategy for IBM S/390 G5 and G6[show abstract] [hide abstract]
ABSTRACT: The Reliability/Availability/Serviceability (RAS) strategy for S/390® G5 and G6 is to continue the S/390 objective of providing Continuous Reliable Operation (CRO). The RAS strategy is constructed with a set of building blocks which work closely together: error prevention, error detection, error recovery, problem determination, service structure, change management, and RAS measurement and analysis. The interdependency among the building blocks is such that removing or weakening any of them limits the ability of the design to achieve the overall CRO objective. Each building block must be fully implemented and must execute flawlessly within itself and together with the other blocks.Ibm Journal of Research and Development 01/1999; 43(5):875-888. · 0.69 Impact Factor
Fully redundant clock
generation and distribution
with dynamic oscillator
M. J. Mueller
L. C. Alves
W. J. Clarke
This paper describes a fully redundant clock generation and
distribution approach with a fully dynamic switchover capability
and concurrent repair. It highlights the challenges as the design
evolved from a single source, to a ‘‘cold’’ standby backup, and
finally to a fully redundant transparent switchover with no
interruption of the workloads running on an IBM System z9e.
The function split between hardware and the various levels of
firmware is described, including the methods to determine the
defect component in the clock distribution paths. Finally, we
describe the joint effort with a major chip technology vendor
to design and develop the necessary circuitry, according to the
z9e requirements, for clock synchronization and switching.
Each zSeries*generation is evaluated to determine the
most significant sources of outages, and design changes
are then made to remove these sources . The traditional
single-source clock generation design was inherently a
single point of failure, which was addressed in the IBM
zSeries 990 by providing two independent oscillator
cards, each housed in a separate field-replaceable unit
(FRU) . This design does not prevent the system from
going down when the oscillator fails, but it does enable
a restart on the other oscillator card and a subsequent
concurrent repair of the failed card. The detection of
failures in the oscillator signal distribution was basically
limited to a total loss of pulse or signal. The new design
for the IBM z9* improves the detection of many clock
signal failure modes, it provides a dynamic switchover
to the alternate oscillator without any disruption to the
application programs that are running, and it enables full
Clock generation and distribution design
The conventional clock signal is usually generated
by a crystal oscillator with a relatively low frequency.
Depending on the system requirements, this frequency is
multiplied by means of a phase-locked loop (PLL) to a
higher frequency, which allows distribution over different
package levels—modules, cards, and boards—to the
processor chips. The upper limit of this frequency is
defined by the card and board material and the length of
the oscillator distribution wires. A buffer circuit sends
individual oscillator signals to all processors. Finally, an
additional PLL, located on the processor chip, multiplies
the oscillator signal to the frequency required by the
processor. New system requirements, such as
programmable oscillator frequency, result in additional
components and reduce the reliability of the oscillator
clock signal generation and distribution. They also
increase the complexity of this function and the risk for
Adding a second source of clock generation is a first
step toward a failsafe, redundant oscillator generation
and distribution design. Together with a multiplexer on
the processor cards, it makes it possible to switch from
a failing oscillator to a backup oscillator clock source.
However, such a design is not capable of automatic
switchover and is unable to detect certain classes of
The next step forward is the introduction of intelligent
monitoring and switchover. In such a design it is critical
that all potential failure modes be addressed early in the
design phase to ensure optimum effectiveness. If the
design covers only a few classes of defects, the complexity
and additional failure rates of the added components may
offset the availability benefits. In addition, all support
logic and firmware must be adequately robust to ensure
?Copyright 2007 by International Business Machines Corporation. Copying in printed form for private use is permitted without payment of royalty provided that (1) each
reproduction is done without alteration and (2) the Journal reference and IBM copyright notice are included on the first page. The title and abstract, but no other portions, of this
paper may be copied or distributed royalty free without further permission by computer-based and other information-service systems. Permission to republish any other portion of
this paper must be obtained from the Editor.
IBM J. RES. & DEV.VOL. 51NO. 1/2JANUARY/MARCH 2007M. J. MUELLER ET AL.
0018-8646/07/$5.00 ª ª 2007 IBM
that there are no single points of failure left in the overall
design. The z9 clock signal generation and distribution
design addresses all of these issues.
System z9 clock signal generation and
The z9 design is based on a superscalar microprocessor
architecture. The central electronic complex (CEC) uses
the same processor book concept introduced with the
IBM zSeries 990. The processor book contains processor
units (PUs), cache, memory, and the cross-books cache
coherency fabric (Figure 1). The memory bus adapter
(MBA) chips are now on the MBA fan-out cards that
are plugged into the processor book. In addition, each
processor book contains a clock chip (CLK). Each CLK
receives both clock signals from each of the oscillator
cards and external timer reference (ETR) signals from
each of the ETR cards. A Sysplex Timer*  provides a
common reference time to each zSeries system clustered
together to form a Parallel Sysplex*. The firmware
required for the CLK initialization and run-time clock
distribution runs on redundant controllers, called flexible
service processors (FSPs) . This redundancy ensures
that firmware control is based on the latest status of
the clock distribution structure.
There were several significant challenges to the
overall design. First, because the z9 is a synchronous
multiprocessor system, the interfaces between processor
chips require low-skew1oscillator signals. Therefore, even
during an oscillator switchover, the skew has to be
maintained within a small, tolerable range. Second, no
missing pulses were allowed to occur during switchover.
This requires that the first missing oscillator pulse must be
detected, and the switchover to the backup oscillator
completed within one cycle. Also, the detection design
and switchover was required to handle many different
types of failures modes, such as power supplies, crystal
oscillator, PLL, monitoring and switching circuits,
frequency drifts, and clock signal wire. Finally, the
oscillator repair had to be performed concurrently
with system operation.
The System z9 redundant clock generation and
distribution with dynamic switchover is based on the
Intelligent Dynamic Clock Switch (IDCS)  module
from Freescale Inc.2and the customized z9 CLK
(Figure 2). Because the z9 design engineers were involved
early in the definition and design stages of the IDCS, they
System z9 clock signal generation and distribution system structure.
. . .
z9 processor cage
SC = System cache control
SD = System data cache
Clock generation and
System z9 redundant oscillator signal generation and distribution
with automatic switchover function. (FB = feedback.)
Power supply for
Power supply for
1Skew is the difference in the arrival time of a clock signal at two or more circuits.
Ideally the clock skew should be zero. Skew is introduced by wire length differences,
manufacturing variations, and other physical characteristics.
2Freescale Semiconductor, Inc. sold its Timing Solutions business to Integrated
Device Technology, Inc.
M. J. MUELLER ET AL. IBM J. RES. & DEV. VOL. 51NO. 1/2JANUARY/MARCH 2007
were able to influence the specification of the circuit to
meet the rigorous IBM z9 requirements.
Two design options were initially considered. First, the
IDCS module would be mounted directly on each of the
processor boards with two external separate oscillator
sources. However, this option was dropped because an
IDCS failure would cause the entire processor card to fail
as the output frequency decreased slightly during a
switchover (all four processor cards are required to
be fully frequency-synchronized). The second option,
mounting the IDCS on each oscillator card and cross-
feeding the output of each card into the IDCS, resolved
this issue. However, the design was still exposed to
failures of the voltage regulator modules or the IDCS
itself. Finally, a combination of the two options was
selected. In the final design, the IDCS is placed on the
oscillator card, and the dynamic oscillator switchover
function is implemented on the CLK oscillator
switchover function (CLK_OSF).
With this design approach, the switchover from one
oscillator signal to a secondary oscillator signal can occur
in the IDCS on the oscillator card or in the CLK_OSF
on the CLKs. The IDCS automatic switchover on the
oscillator card occurs when one of its asynchronous
inputs fails [(1) in Figure 2]. During the IDCS switchover,
the frequency at the output is slightly reduced until the
circuit is relocked to the secondary input.
The CLK_OSF automatic switchover occurs when the
primary input signal (2) fails. During switchover to the
a single oscillator cycle to be slightly longer than all other
cycles. In order to achieve CLK_OSF automatic
switchover, both input signals (2) and (4) must be
synchronized, and the secondary input signals must
be early with respect to the primary input signal. The
synchronization is achieved by feeding the output from
IDCS0 on OSC0 to the input of IDCS1 on OSC1.
Therefore, both CLK input signals are generated from
signal is generated by adding a delay line in the feedback
path of the IDCS on the oscillator card (labeled FB in
Figure 2). The delay takes into account the tolerances of
the IDCS and wiring. The delay value has to be greater
than zero, typically 500 ps, to allow shifting the secondary
signal by means of delay lines in the CLK_OSF until the
rising edge of the secondary signal occurs approximately
50 ps later than the rising edge of the preliminary signal at
signals (2) and (4) sent to the CLK and the internal time
of the CLK_OSF are shown in Figure 3.
High-level functional description
The oscillator generation and distribution consists of a
16-MHz voltage-controlled crystal oscillator, a PLL,
an IDCS, a control chip, and a power supply. These
components are located on an oscillator card. The 16-
MHz oscillator is used to generate a stable frequency.
This signal is fed to a PLL, where it is multiplied to the
final clock distribution frequency. From the PLL, one
output signal is sent to the CLK0 input of the IDCS on
the same card, and a copy of this signal is connected
to the CLK1 input of the IDCS on the other card.
The IDCS monitors up to four differential oscillator
inputs continuously (only three are actually used in
the design). In the case of a failing signal, the IDCS
automatically switches over to the next input in a round-
robin fashion. The selected input, which initially is
defined by the I2C (Inter-Integrated Circuit bus protocol),
is repowered and sent out as up to eight differential
outputs. In addition to this basic function, the status
of the input signals can be observed and reported to
a control system.
The oscillator cards and the corresponding components
are set up and controlled by the control chips and the
associated firmware running on the FSPs. The setup
and control functions are the following:
? Power-on and power-off of oscillator card
? Check power status (over current, under voltage).
of input select
to PU chips
Phase jump in case
of oscillator error
error stuck at 0'
IBM J. RES. & DEV.VOL. 51 NO. 1/2JANUARY/MARCH 2007 M. J. MUELLER ET AL.
? Set up PLL (frequency).
? Check PLL status (PLL locked).
? Set up IDCS (frequency range, IDCS mode, primary
and secondary input).
? Check IDCS status (inputs running or failing, which
input used, lock indicator).
? Set up OSC_SELECT signals to CLK.
Each of the two oscillator cards (Figure 2) can
be set up as a primary card, secondary card, single
card, or disabled card. Setup is performed using the
OSC_SELECT signals (7) and the setup of the IDCS
(Table 1). Assuming that oscillator card 0 (OSC0) is
set up as the primary card and OSC1 is set up as the
secondary, signal (1) is selected from the PLL0 by IDCS0
and is driven as the primary clock signal (2) to all of the
CLKs. A second signal (5) from OSC1 is connected to
input 1 of IDCS0 (defined as the secondary clock input,
based on Table 1). This signal (5) has the same frequency
as (1). However, the two signals do not have a fixed phase
relation and therefore are not synchronous. In case of a
failure of the oscillator or the PLL0 on the OSC0 card,
the IDCS0 switches automatically from input 0 to input 1
while continuously providing output signals to the CLKs.
However, to address certain types of failures—such as the
power supply (PS0), IDCS0, or wiring between the
IDCS0 and the CLKs—the switchover function
(CLK_OSF) was implemented on the CLK.
CLK switchover structure
The REC 0 and REC 1 functions on the CLK receive
the oscillator signal from oscillator card 0 and card 1,
respectively (Figure 4). Directly attached to each receiver
is a wire test block. The wire test function continuously
observes the positive and the negative leg of the
differential oscillator signal. If one leg of the differential
signal is broken, the wire test sends an error signal to the
CNTL block. If a secondary oscillator is available, a
switchover is triggered. This function is required because
a differential receiver may show a good signal at its
output—even with only one input working correctly and
the second input broken. In this state, the system would
be exposed to higher clock jitter, which might cause
system failure. Otherwise, the system would be exposed to
failure due to jitter induced by a failure of the differential
The output of the differential receiver is also connected
to a programmable delay line. This delay line has 128
steps, typically with 20 ps per step. The delay setting
associated with the delay line in the primary path is
always set to 0; hence, there is virtually no delay from the
output of the primary oscillator to the receiver of the
input select block. The delay setting associated with the
secondary path is set such that the input select block
always receives the secondary oscillator signal later than
the primary oscillator signal. This is accomplished by the
phase-compare function that continuously measures the
two signals to the input select and adjusts the delay line
accordingly. The amount of delay difference between the
primary and the secondary signals to the input select are
adjusted to maintain the difference as small as possible.
Otherwise, during a switchover, the signal appears as a
phase jump, which is not acceptable. The typical phase
difference achieved in System z9 is 50 ps.
The input select consists of an edge-triggered flip-flop
with two inputs. A rising edge of either input sets the flip-
flop and sends a 1 to the REPOWER block; a falling
edge resets this flip-flop and sends a 0. The input signal
received first is always sent to the output. The input select
sends the proper oscillator signal as soon as one of the
inputs works properly, even if the other input is stuck at
1 or 0.
The input select block provides other significant
functions. It provides a lock mechanism which ensures
that a switching back cannot occur after a switchover has
taken place. This prevents bouncing back and forth due
to intermittent types of failures (e.g., a poor contact or
solder joint). In addition, it contains a filtering function
that prevents erroneous switchover due to minimal noise
or short-term jitter.
The REPOWER function serves to redrive the output
signal of the input select block to the processor chips. The
signal paths are also implemented with differential point-
Oscillator setup conditions.
Primary cardSecondary card Single cardDisabled card
IDCS input 0 Primary oscillator in Secondary oscillator in Oscillator inOscillator in
IDCS input 1 Secondary oscillator in DisabledDisabledDisabled
IDCS input 2 DisabledPrimary oscillator inDisabledDisabled
M. J. MUELLER ET AL. IBM J. RES. & DEV.VOL. 51NO. 1/2 JANUARY/MARCH 2007
to-point connections to minimize any potential signal
The delay decrement CNTL function guarantees low
skew between processor chips after a switchover occurs.
This is achieved by setting the delay line greater than 0 in
the active oscillator path. Once the defective oscillator
card is concurrently replaced, the delay decrement CNTL
function incrementally resets the delay line back to 0. This
ensures that the oscillator card replacement is prepared to
resume the primary path if necessary and also reduces
jitter due to the additional circuits within the delay line
The CNTL block contains all combinatorial logic
required to set up the oscillator switch function correctly.
The CNTL function decodes incoming OSC_SELECT
signals and sends control and status signals to the other
functions. In addition, it gathers and interprets the status
of the wire test, phase compare, and input select blocks,
and generates the two OSC_STATUS output signals that
are sent back to each oscillator card. The OSC_STATUS
provides information on the CLK condition, oscillator
operation, and secondary oscillator, and indicates
whether the CLK has switched to the secondary card. The
oscillator switch status register also maintains important
status information on the oscillator operation, the wire
test, the active oscillator, and the condition of the
Delay decrement CNTL
Each CLK in the system receives the same logical
representation of the OSC_SELECT signals. The CLKs
select between the two redundant incoming differential
oscillator signals from OSC0 or OSC1. A change of the
OSC_SELECT signal indicates a switchover from the
primary oscillator card to the secondary oscillator card.
An OSC_SELECT change is asserted to all CLKs in
the same cycle, ensuring a dynamic and synchronous
switchover for all CLKs. When a CLK itself is the root
cause of the switchover, the erroneous CLK triggers an
immediate local switchover. However, the assertion of
changed OSC_SELECT signals still occurs at the same
cycle to all CLKs in the system within a short time period
after the occurrence of the error.
The delay decrement CNTL logic reduces the value of
the delay line (which is in the active oscillator path) from
the given value after the switchover occurred, down to 0.
Each delay line contains 128 delay circuits, with a typical
delay value of 20 ps per circuit. Because of CLK process
variations, the delay value per circuit varies from chip to
chip. After a switchover, the absolute delay value of each
DELAY line Dion all CLKs in the system is the same.
However, the number of delay circuits xivaries from chip
to chip because of CLK circuit delay dcir,ivariations:
Di¼ xi? dcir,i
D0¼ x0? dcir,0
D1¼ x1? dcir,1
(Value of delay line on CLKi).
(Value of delay line on CLK0).
(Value of delay line on CLK1).
(Value of delay line is the same across
Therefore, in order for each CLK to achieve the same
delay line value, the number of delay circuits ximust be
individually selected for each CLK.
The removal of delay circuits is performed in
equidistant steps over time. The equidistant step rate
Rifor each individual CLK is calculated prior to doing
the removal. The calculation is performed such that the
overall time of removal TRis the same for every CLK in
the system. Because the number of circuits that have to be
removed can be different on every CLK, the step rate
varies from CLK to CLK. This approach guarantees
minimum skew between chips during the time of removal.
The overall time of circuit removal expressed in CLK
cycles is defined as
TR¼ Ri? xiþ ri;
where riis an integer remainder on CLK i and 0 ? ri, xi.
Equation (1) represents the correlation of two positive
integers TRand xi. Given a dividend TRand a divisor xi,
the integer division consists in obtaining an integer
quotient Riand an integer remainder ri. The hardware
Clock chip oscillator signal switchover function.
from card 0
sent to both
(for a total of
Two signals received from both
oscillator cards (for a total of four signals)
from card 1
IBM J. RES. & DEV. VOL. 51NO. 1/2JANUARY/MARCH 2007M. J. MUELLER ET AL.