# 0.84 ps Resolution Clock Skew Measurement via Subsampling

**ABSTRACT** An all-digital on-chip clock skew measurement system via subsampling is presented. The clock nodes are subsampled with a near-frequency asynchronous sampling clock to result in beat signals which are themselves skewed in the same proportion but on a larger time scale. The beat signals are then suitably masked to extract only the skews of the rising edges of the clock signals. We propose a histogram of the arithmetic difference of the beat signals which decouples the relationship of clock jitter to the minimum measurable skew, and allows skews arbitrarily close to zero to be measured with a precision limited largely by measurement time, unlike the conventional XOR based histogram approach. We also analytically show that the proposed approach leads to an unbiased estimate of skew. The measured results from a 65 nm delay measurement front-end indicate that for an input skew range of ±1 fan-out-of-4 (FO4) delay, ±3σ resolution of 0.84 ps can be obtained with an integral error of 0.65 ps. We also experimentally demonstrate that a frequency modulation on a sampling clock maintains precision, indicating the robustness of the technique to jitter. We also show how FM modulation helps in restoring precision in case of rationally related clocks.

**0**Bookmarks

**·**

**116**Views

- [Show abstract] [Hide abstract]

**ABSTRACT:**A scheme for built-in self-test of analog signals with minimal area overhead for measuring on-chip voltages in an all-digital manner is presented. The method is well suited for a distributed architecture, where the routing of analog signals over long paths is minimized. A clock is routed serially to the sampling heads placed at the nodes of analog test voltages. This sampling head present at each test node, which consists of a pair of delay cells and a pair of flip-flops, locally converts the test voltage to a skew between a pair of subsampled signals, thus giving rise to as many subsampled signal pairs as the number of nodes. To measure a certain analog voltage, the corresponding subsampled signal pair is fed to a delay measurement unit to measure the skew between this pair. The concept is validated by designing a test chip in a UMC 130-nm CMOS process. Sub-millivolt accuracy for static signals is demonstrated for a measurement time of a few seconds, and an effective number of bits of 5.29 is demonstrated for low-bandwidth signals in the absence of sample-and-hold circuitry.IEEE Transactions on Very Large Scale Integration (VLSI) Systems 02/2014; 22(2):334. · 1.22 Impact Factor - [Show abstract] [Hide abstract]

**ABSTRACT:**This paper presents asynchronous sub-sampling techniques to measure delay mismatch of clock and data lanes in high-speed serial network-on-chip (NoC) links. The techniques allow the use of low quality sampling clocks to reduce test hardware overhead for integration into complex MPSoCs with multiple NoC links. It enables compensation of delay variations to realize high-speed NoC links with sufficient yield. The proposed techniques are demonstrated at NoC links as part of an MPSoC in 65nm CMOS technology, where the calibration leads to significant reduction of bit-error-rates of a 72 GBit/s (8 GBit/s per lane) link over 4mm on-chip interconnect.01/2011; - SourceAvailable from: Bharadwaj Amrutur[Show abstract] [Hide abstract]

**ABSTRACT:**An all-digital technique is proposed for generating an accurate delay irrespective of the inaccuracies of a controllable delay line. A subsampling technique-based delay measurement unit (DMU) capable of measuring delays accurately for the full period range is used as the feedback element to build accurate fractional period delays based on input digital control bits. The proposed delay generation system periodically measures and corrects the error and maintains it at the minimum value without requiring any special calibration phase. Up to 40 improvement in accuracy is demonstrated for a commercial programmable delay generator chip. The time-precision trade-off feature of the DMU is utilized to reduce the locking time. Loop dynamics are adjusted to stabilize the delay after the minimum error is achieved, thus avoiding additional jitter. Measurement results from a high-end oscilloscope also validate the effectiveness of the proposed system in improving accuracy.IEEE Transactions on Instrumentation and Measurement 01/2012; 61(7):1924-1932. · 1.71 Impact Factor

Page 1

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS1

0.84 ps Resolution Clock Skew

Measurement via Subsampling

Bharadwaj Amrutur, Member, IEEE, Pratap Kumar Das, Student Member, IEEE, and

Rajath Vasudevamurthy, Student Member, IEEE

Abstract—An all-digital on-chip clock skew measurement

system via subsampling is presented. The clock nodes are sub-

sampled with a near-frequency asynchronous sampling clock to

result in beat signals which are themselves skewed in the same

proportion but on a larger time scale. The beat signals are then

suitably masked to extract only the skews of the rising edges of the

clock signals. We propose a histogram of the arithmetic difference

of the beat signals which decouples the relationship of clock jitter

to the minimum measurable skew, and allows skews arbitrarily

close to zero to be measured with a precision limited largely by

measurement time, unlike the conventional XOR based histogram

approach. We also analytically show that the proposed approach

leads to an unbiased estimate of skew. The measured results from

a 65 nm delay measurement front-end indicate that for an input

skew range of

1 fan-out-of-4 (FO4) delay,

0.84 ps can be obtained with an integral error of 0.65 ps. We also

experimentally demonstrate that a frequency modulation on a

sampling clock maintains precision, indicating the robustness of

the technique to jitter. We also show how FM modulation helps in

restoring precision in case of rationally related clocks.

?

resolution of

Index Terms—Arithmetic difference, asynchronous subsam-

pling, clock skew measurement, frequency modulation, histogram

analysis, time-to-digital converter.

I. INTRODUCTION

P

timing budgets, there is a need for measuring the skew in the

clock network in the presence of increasing process variability,

to enable active skew compensation. This requires measure-

ment of skewsbetween theperiodic clock signals at various leaf

nodes. Delay measurement of many circuit structures can also

be converted to skew measurement of periodic signals by ex-

citing these with a periodic source. In general, the delay is dig-

itized using various types of time-to-digital converters (TDC).

Authors in [1] propose a very precise coarse-fine time-to-dig-

ital converter based on the principle of Successive Approxima-

tion ADCs and a time amplifier. The authors in [2] propose a

flash time-to-digital converter which utilizes arbiters and can be

calibrated for a very high resolution. It uses the spread of ar-

biter threshold voltages for getting a set of digital codes from

an array of arbiters to measure the delay. But it is limited by the

nonuniform distribution of the offsets of the arbiters, and also

RECISE on-chipdelay measurement hasbeen a challenge

sincetheevolutionofdigitalintegratedcircuits.Withtight

Manuscript received February 25, 2010; revised August 05, 2010; accepted

September 20, 2010.

The authors are with the Department of Electrical Communication En-

gineering, Indian Institute of Science, Bangalore-560012, India (e-mail:

amrutur@ece.iisc.ernet.in; pratap@ece.iisc.ernet.in; rajath@ece.iisc.ernet.in).

Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TVLSI.2010.2083706

worksforasmalltimingrange.Theauthorsin[3]and[4]survey

some of the popular digital techniques for delay measurements

whichusetappedandvernierdelaylinemethods.Someoftheis-

sues of the traditional vernier delay line are addressed by using

a component-invariant vernier delay line in [5]. Most of these

TDCs have the capability to measure delays between edges of

two aperiodic signals. They can also make a single shot mea-

surement—that is, a single occurrence of an edge pair can be

used to determine their time separation. The authors in [6] pro-

pose a scheme to characterize the period jitter by obtaining the

cumulative distribution function of the clock edges, and modify

it to measure the skew between two leaf nodes of a clock distri-

bution network.

However, when measuring skews between periodic signals,

asynchronous sampling followed by histogram analysis leads

to a simpler implementation. Asynchronous sampling has been

proposed as a way to evaluate data converters in [7]. Asyn-

chronous sampling clock achieves the effect of uniform random

sampling across all the voltage bins for the data converter. This

idea has been applied in [8]–[11] to calibrate the delay between

two clock phases. Instead of obtaining histogram counts across

voltage bins, the authors set up a histogram to count the number

oftimestheasynchronoussamplingclockoccursin-betweenthe

edges of the two periodic signals under measurement, by taking

anXORofthesampledoutputs.Theratioofthehistogramcount

to the total number of samples gives an estimate of the phase

spacing. However, in this approach, the minimum skew that can

be measured is limited by the clock jitter. As an illustration, if

the two edges are at nominally zero skew, due to uncorrelated

jitter between them, every once in a while, the sampler outputs

will be different. This in turn causes the XOR output to go high

resulting in an increment to the histogram counter, which will

then show an erroneous nonzero value for the skew. If

the uncorrelated jitter for each of the two edges, then we can

expect the minimum nonzero skew that can be detected with

99% probability to be

work-around to this limitation by measuring the skews of each

clock node with reference to another signal which has a delay

much larger than

. Our proposed approach does away

with the need for another clock signal against which to measure

each node’s delay. In the Appendix, we show analytically that

this approach presents an unbiased estimator of skew. We ex-

perimentally validate the technique using a 65 nm test chip and

verify that the clock jitter doesn’t limit the precision, but only

measurement time does. Our current work is an improvement

to an earlier work discussed in [12], which has a more com-

plex hardware implementation. The paper is organized as fol-

lows: in Section II we describe an alternative technique to de-

termine a bin hit, which in turn gives us the input skew estimate.

is

. The authors in [8] propose a

1063-8210/$26.00 © 2010 IEEE

Page 2

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

Fig. 1. Illustration of subsampling system. An extra sampler is introduced at

each leaf node for skew measurement.

This eliminates the dependency of bin size on clock jitter and is

limited (theoretically) by measurement time only. We describe

our experimental setup to validate these ideas in Section III.

The measurement results from a 65 nm test chip is provided in

Section IV, followed by our conclusions in Section V.

II. HISTOGRAM OF ARITHMETIC DIFFERENCE

A. System Overview

Consider an arbitrary buffer and interconnect network, in

which we need to measure the skew (or delay) between two

nodes

, as shown in Fig. 1. For example, this could be

a clock distribution network and the nodes of interest might

be leaf nodes of such a network. By exciting the input of this

network with a periodic clock source

expect periodic outputs at the two leaf nodes, which have a

relative skew of . We introduce two samplers at each of these

nodes, which are clocked by a separate sampling clock,

with a slightly different period

sampling clock which is either asynchronous to

by using an independent crystal source) or is rationally related

(via a DLL/PLL), but has additional jitter added (through FM).

The output of the two samplers will be beat signals as shown

in Fig. 2, whose period is given as

essentially the sampling clock period amplified by a factor

. The input skew is also amplified as a skew between

the subsampled outputs to be

terms of unit interval, i.e., the fraction

measured by the proposed delay measurement unit. Note that

any skew in the sampling clock to the two samplers

also add to the skew in the beat signals as shown in Fig. 3. But

this is inevitable for any skew measurement in a distributed

network which requires a reference clock. Similarly, input

voltage offset

of the two samplers in conjunction

with finite slew

of the input signals also adds to the skew

in the beat signals. Hence, what the delay measurement unit ac-

tually measures is

accuracy of measurement is limited by the quality of sampling

clock distribution and input sampler mismatch. However, we

will show later that the precision of the measurements will be

, of period T we can

.1We can use a

(obtained

, which is

. The skew in

is then digitally

will

. Thus, the

1Note that ? ? ?? as well as any ?? ? ?? will also work for an integer

n. However for ? ? ?, the measurement time is increased by the factor of n to

achieve the same accuracy.

Fig. 2. Illustration of signal period amplification in subsampling system.

Fig. 3. Illustration of different source of errors in the measurement system.

largely determined by the measurement time and this will be

the focus of discussion in the rest of the paper.

Formeasuringtheskewbetweentwonodes

is used as shown in Fig. 4. Samplers are introduced at

to give subsampled signals

cessed with the aid of de-bounce and masking state machines

to mask out the falling edge statistics to give

ference,

, is accumulated in a counter for

clock cycles. It is then right shifted by k bits (divided by

obtain the digital code word for

surement unit, consisting of the state machines and counters,

can be shared across all the sampled nodes. By using a multi-

plexer to select two subsampled signals, skews between pairs of

nodescanbeobtained.Thepair-wiseskewinformationcanthen

be stitched together to give the overall skew distribution across

all the measured nodes. Since the subsampled outputs are in the

domain of the sampling clock, which is the same as that for the

delay measurement unit, routing of these signals is simplified

greatly. The only constraint is the need for the same number of

pipeline delays for each of the subsampled signals. This also al-

lows for easy measurement of skews in a very high speed clock

networks. Because the delay measurement unit can be shared,

the area overhead for this approach is very small.

Due to jitter and finite rise-time of the signals and the meta-

stability of the samplers, their outputs will have bounces be-

tween the digital values, as shown in Fig. 5. Since we are in-

terested in finding the skews for only one polarity of edges of

the inputs

and, we need to suitably mask out the sam-

pled signals corresponding to the falling edges. This is done via

two state machines, as shown in Fig. 5. The timing waveforms

of the signals used in the state machines are also sketched. The

de-bouncestatemachinedetectsthefirstrisingedgeonthesam-

pleroutput,

andassertsthe

ration of the beat signal

. The high duration of the beat signal

and,asetup

and

and. These signals are pro-

, . Their dif-

sampling

) to

. Note that the delay mea-

signaltocoverforthehighdu-

Page 3

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

AMRUTUR et al.: 0.84 ps RESOLUTION CLOCK SKEW MEASUREMENT3

Fig. 4. Setup for skew estimation.

Fig. 5. State machines and timing diagrams for various signals in the skew estimator unit.

is determined by a timer clocked by the sampling clock. After

the timer crosses the threshold, the first time either of

fall, the mask signal, m, is de-asserted to de-assert

multaneously. Now the signals

edgeinformationfortheinputsignals

histogram analysis gives the rising edge statistics.

Toensurethattheenablesignals

and to go to zero due to the bouncing edges of

transitions, the threshold value should be set to an appropriate

value. The upper limit on that is imposed by the

In our implementation, it is set to 16 as for all the cases when

and was set to 2 when

as for lower ratios, getting more than two bouncing

transitions around a rising edge of

practical values of jitter on the clocks.

When a histogram count of the XOR of the two subsampled

signalsistakenaspresentedin[8]–[11],duetopresenceofjitter

or

andsi-

and contain only the rising

and ,andhencetheir

donotgetfalsetriggered

near the rising

ratio.

is highly unlikely for the

on the clocks, one can expect errors for estimated skews of the

order of the jitter. This can be easily understood for the case

of exactly zero nominal skew between

this case, due to uncorrelated jitter in

count will increment to a nonzero value, which is related to the

uncorrelated

. This effect is termed as reordering issue

in [8]. The authors in [8] get around this by using a reference

signal which has a much larger delay than

the delays of the individual nodes against this. The values are

then subtracted to give the delay between the original nodes.

The proposed arithmetic difference in effect achieves this, but

without needing an extra reference signal, thus reducing area

and power.

High resolution measurement requires the sampling clock to

uniformly sample at all times between the rising edges of

and . This is guaranteed by using an asynchronous sampling

clock derived from an independent crystal.

and(Fig. 6). In

, the histogramand

and measure

Page 4

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

Fig. 6. Illustration of error due to jitter in input clocks for measurement of

skews around zero.

B. Analytical Overview

When the sampling clock is asynchronous to the clock

driving the measurement nodes,

and hence can be written as

integer and

. This causes the sampling edge to fall

uniformly acrossthe entireperiod of thesampled signal. Hence,

the percentage of time the sampling edge falls in between the

sampled edges is directly proportional to the skew as a fraction

of the period.

Let

andbethetimes,withinaclockperiodwhen

crossthelogichighthresholdrespectively,andlet

when thesamplingclock crosses thesamplingthreshold. Dueto

jitter,

,, andbecome random variables. The average of

the arithmetic difference between the sampler outputs over

sampling clock cycles is used as an estimate for the skew

is an irrational number

where N is an

,

bethetime

(1)

where

sampler outputs. It is shown in the Appendix that:

, the arithmetic difference between the

(2)

Thus, the delay measurement statistic is an unbiased estimator

of the skew as a fraction of the clock period (UI). A theoretical

loose upper bound for the standard deviation of the estimate(de-

rived in the Appendix) is

(3)

III. EXPERIMENTAL SETUP

We have fabricated a test chip in a 65 nm process node

(Fig. 7). The test structures essentially consist of a number

of samplers, buffers and a multiplexer, which provide the

front-end for the skew estimator as discussed in Fig. 4. The two

clock inputs whose skew we want to measure are supplied from

outside the chip so that calibrated skews can be introduced and

the performance of the technique can be studied. Similarly,

the reference clock was also provided from outside to enable

experimentation with different values of T and

signal outputs from the multiplexers were directly taken out

. The beat

Fig. 7. Chip layout of the delay measurement front-end taped out in 65 nm

industrial CMOS process.

Fig. 8. Measurement setup in the lab.

of the chip and processed in a FPGA board, so that various

de-bouncing algorithms and digital processing options could

be experimented with in a flexible manner.

Since we did not have signal generators to generate two input

clocks with skews in the subpico second resolution, we synthe-

size such delays using a cable of fixed length, and varying clock

frequency. A single clock source is passed through a cable of

lengthchosentoprovideadelayofaboutoneclockperiod.Then

the relative delays between the edges at the input and output of

the cable are given by the difference in the propagation delay

through the cable and a clock period. Thus, precise delays be-

tween the edges of the input and output of the cable can be ob-

tained by adjusting the clock period (Fig. 8). Good quality RF

coaxialcablesarechosentoensurethatthesignalsthroughthem

are not distorted due to attenuation of high-speed components.

For the two clock sources, two vector signal generators (R&S

SM300 and Agilent E4428C) were used. The sinusoidal signals

were converted to square waves using a high speed comparator

Page 5

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

AMRUTUR et al.: 0.84 ps RESOLUTION CLOCK SKEW MEASUREMENT5

Fig. 9. Input-output delay characteristic for an input delay range of ?1FO4

delay and the residual error after linear fit.

ADCMP562.Thermsjitterattheinputofthetestchipwasmea-

sured to be 30 ps.

The highest frequency of operation was limited by the I/O

specifications of the test chip and the peripheral devices for reli-

able operation. Hence, the cable delay and the input delay range

were chosen so that it gives zero input delay to the test chip

around that frequency.

IV. MEASUREMENT RESULTS

Fig. 9 shows the measured output of the delay measurement

unit as a function of the input delays and the residue plot after

doing a linear fit, with the choice of parameters

ps and . For removing the delay offset due

to routing from the chip pads to the sampler inputs, the input

delay was swept till the output delay was measured to be zero.

Withinthetestchiptoo,byshortingthesamplerinputs,theinput

delay could be set to zero which also measured delay very close

to zero (actually that gives the delay due to sampler offset). In

all graphs which report results for externally provided delays,

we have canceled the delay offsets due to the routing from chip

pads to sampler inputs. For an input delay range of around

FO4 delay ( 20 ps), the measured standard deviation for each

ns,

1

Fig. 10. Measured input-output delay for an input delay range of ?600 ps.

Fig. 11. Measured standard deviation as a function of number of samples for

asynchronous case.

point varies between 0.2 and 0.3 ps. The measured maximum

error after a linear curve fit (integral error) is 0.65 ps.

For the wider input delay range of

integral error is 8 ps (Fig. 10). We have also tested with larger

input delay ranges of

1.5 ns, which results in an integral error

of 40 ps. The larger integral errors for higher input delay ranges

are due to fluctuations in the rise time of the input signal to the

chip. This has been verified using an Agilent 54854A (4 GHz,

20 GSa/s) Oscilloscope to observe the signals at the chip inputs.

Fig. 11 shows the measured standard deviation of an on-chip

delay element as a function of the number of samples for an in-

ternal delay generator with parameters of

ps, and varying k. The standard deviation reduces with

square root of the number of samples up to

matches well with (3). It saturates to 0.14 ps beyond

ples and hence is the limit of the resolution that is achievable

with our setup, leading to a resolution of 0.84 ps. Theoretically,

with the assumption of independence between each of the core

clocks and the sampling clocks, the variance of estimated skew

600 ps, the maximum

ns,

samples and

sam-

Page 6

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

Fig. 12. Measured standard deviation as a function of number of samples for

rationallyrelatedsamplingclockcasewithFM,fordifferentmodulationindices.

decreases monotonically as the number of samples increases.

However, in a practical setup, correlations between the core

clocks (leaf nodes) show up as they are derived from the same

source. Moreover, the samples across time are also correlated

due to

noise in the signal sources, and their statistics are

time dependent. Hence, the variance of estimated skew is lower

boundedduetothesecorrelatedjittercomponents,whichcannot

be reduced by simple averaging.

Adding jitter to the signals doesn’t affect the resolution and

in fact improves resolution in certain cases. For example con-

sider the case when the sampling clock is rationally related to

the measurement clock with period as

case, the measurement resolution is limited to

it can be increased to almost the same level as that of the asyn-

chronous sampling case by using frequency modulation on the

sampling clock. This acts as jitter and randomizes the sampling

edges, mimicking the asynchronous case. For validating this in

the lab, we connected the reference out signal of the core clock

source to the reference in point of the sampling clock used in

Fig.8.Withthischangeboththesourcesarefrequencylockedto

asinglecrystalandusingtheirinternalPLLstheygenerateratio-

nallyrelatedcoreandsamplingclockfrequencies.Thesampling

clock source has the option of providing frequency modulation,

which is used to create artificial jitter. Fig. 12 shows the stan-

dard deviation of measured delay of an internal delay source for

ns,,

modulation. The error in measured delays is large for the case

of no frequency modulation. However, measured standard devi-

ation decreases as square root of number of samples similar to

the asynchronous case of (3), for FM deviation of 5 kHz. The

results are same for frequency modulation frequencies of 20 Hz

or 100 Hz except the fact that for smaller modulation indexes

, the minimum standard deviation obtain-

able was 0.3 ps where as with higher FM dev. of 5 and 80 kHz it

could be improved to 0.14 ps. Such a phenomenon where noise

improves resolution is well known in threshold systems [13].

Fig. 13 shows the measured delay of an internal delay element

acrossdifferent

for

. In this

. However

with and without frequency

ns.Forthecaseswhen

Fig. 13. Measured delay of internal delay element across different ??’s for

? ? ???? ns with and without applying frequency modulation.

is integer, the error becomes high. The number of samples for

each measurement is at least

even for the integer ratio case, the delay error reduces to that for

other values of

as shown in Fig. 13.

Besides the averaging error, the other sources of error in

the skew measurement are the mismatches in the input offset

voltage of the two samplers and the skews in the sampling

clock. With a voltage offset of

samplers and a input slew rate of

, and needs to be minimized by careful sizing

and fast edge rates. If the two test nodes are close by, then we

can also apply offset compensation scheme like in [8] by giving

zero delay inputs to the samplers and compensating their offsets

by digitally calibrating their trigger points. The skew at the

sampling clock inputs of the two samplers must be minimized

by careful routing. This component of error is bound to be there

in any scheme where the skew between two distant nodes are

to be measured as that necessitates a reference clock.

As a simple application of this technique, we measure the

setup window mismatches of samplers using a test structure

shown in Fig. 14(a). Eight pairs of samplers are fed by the same

input signal and each pair’s output is fed to the delay measure-

mentunit.Notethatasingledelaymeasurementunitcanbeused

to measure delays across many pairs of nodes, thus reducing the

area overhead significantly. Since the samplers are laid out in

close proximity, there is no input skew in the data as well as

the clock inputs and what is measured is the effect of sampler

mismatches. Fig. 14(b) shows the measured sampler mismatch

for the eight pairs to be within 1 ps with a standard deviation of

0.14 ps with

ns,

for averaging.Therather largeoffset of thesampler mismatches

is due to poor rise time of the sampling clock, which was con-

firmed with post layout simulations.

In another experiment, we measured the supply voltage de-

pendency of the delay of an internal buffer using the setup as

shown in Fig. 15(a). The measured delay in Fig. 15(b) was

close to the layout-simulated delay of the internal buffer. Lesser

number of samples were required in this measurement as com-

pared to those taken for Figs. 9 and 10 as in this case, the input

. With frequency modulation,

between the two

, the error introduced is

ps andsamples taken

Page 7

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

AMRUTUR et al.: 0.84 ps RESOLUTION CLOCK SKEW MEASUREMENT7

Fig. 14. Characterization of sampler mismatch. (a) setup to measure mismatch between eight pairs of samplers. (b) Measured results.

Fig. 15. Characterization of dependence of delay of an internal buffer on supply voltage. (a) setup to measure supply voltage dependence of delay of an internal

buffer; (b) results obtained.

delay being that of an internal buffer is more stable than the

externally fed cable delays. As shown in the figure, delay incre-

ments of less than 1 ps can be resolved.

The measurement time is

can relate it to standard deviation and hence resolution

(3), as

sampling clock cycles and we

, from

(4)

The measurement time per conversion is around 140 ms in

order to obtain a

resolution of 0.84 ps. In case of shared

measurement across different leaf nodes, the measurement

time will be increased by the number of leaf nodes for which

the measurement has to be performed. Hence, there exists

a trade-off between the area overhead and the measurement

time. But, since the skew measurement unit requires relatively

small area (around 1 K NAND2 equivalent gates) and doesn’t

need any placement constraints, multiple copies of it can be

replicated to reduce the total measurement time for shared

measurement cases. Because of relatively small gate count and

very low activity factor (since the data inputs are subsampled

inputs), the power consumed in the delay measurement module

is quite insignificant.

Page 8

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

V. CONCLUSION

Asynchronous subsampling followed by statistical averaging

allows measurement of static skews between periodic signals.

The proposed techniques of de-bouncing followed by averaging

of the arithmetic difference of the signals remove any depen-

dency of resolution with sampling clock jitter, unlike in pre-

vious works. Measured results from a 65 nm test chip indicate

the ability to measure skews with a

and integral error of 0.65 ps for an input skew range of 1 FO4

delay( 20ps).Thetechniquecanalsobeusedtomeasurelarger

skews even close to

, where T is the clock period. The

precision is unaffected by clock jitter as a measurement resolu-

tion of 0.84 ps is obtained with clock sources with 30 ps rms

jitter. This is further validated by experiments where frequency

modulation on sampling clock preserves the resolution. In fact,

in certain cases where the sampling clock is rationally related to

core clock, frequency modulation improvesresolution, which is

otherwise degraded.

resolution of 0.84 ps

APPENDIX

In this appendix, we will sketch the derivations for (2) and

(3) of the main text. Let

,

period when data clocks

,

respectively, and let

be the time when the sampling clock

crosses the sampling threshold. Due to jitter, these are random

variables. Without loss of generality, let the mean of

The mean of

is , the quantity to be estimated.

Let

be the times within a clock

cross the logic high threshold

be zero.

and let

(5)

where

nent.

It is of interest to determine the probability that the samplers

sample a logic high. A sampler samples a logic high if the sam-

pled clock edge occurs earlier than the sampling clock edge.

Hence

is the mean value of, and is the random compo-

(6)

Let

. Letbe the CDF of. From (6)

(7)

Let

. Letbe the CDF of. Then

(8)

The output of the delay measurement unit of (Fig. 1) is given

as

(9)

with

that

, the difference of the ith samples. It follows

(10)

where

Let the clock period be

, where

is thesampling instant.

and the sampling clock period be

, where is an integer and

. This causes the sampling edge to fall uniformly

across the entire period of the sampled signal to create one beat

period. Let the measurement be taken over

. Hence, (9) can be rewritten as

beat periods, so

(11)

Let

period. Then

be the starting phases in each beat

(12)

Substituting from (10), applying the law of iterated expectation

and reordering the summation, we get

(13)

Since

inner expectation is identical for each

the following integral:

s are uniform over 0 to , (with PDF of

and can be evaluated as

), the

(14)

The above summation can be replaced by an integral over the

entire clock period

. However, if we assume that the skew

and the jitter of the clocks are small compared to the period

then the limits of the integration can be replaced by

,

as

(15)

In general, evaluating this integral is difficult. However, in this

particular case, we can revert to the following trick of differen-

tiating (15) with respect to

(16)

Since the term inside the integral is a PDF and integrates to

unity, proving (2)

Page 9

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

AMRUTUR et al.: 0.84 ps RESOLUTION CLOCK SKEW MEASUREMENT9

The variance ofcan be bounded as

(17)

From (9) and (17)

(18)

from which the bound on the standard deviation of S given in

(3) follows:

ACKNOWLEDGMENT

The authors gratefully acknowledge the help of Dr. V.

Viswanathan and J. Sridhar from Texas Instruments, Bangalore

for their support in chip fabrication; V. Janakiraman for help

during chip design; V. Syam for help in board design and J.

Balaji for discussions.

REFERENCES

[1] M. Lee and A. Abidi, “A 9 b, 1.25 ps resolution coarse-fine time-to-

digital converter in 90 nm cmos that amplifies a time residue,” IEEE J.

Solid-State Circuits, vol. 43, no. 4, pp. 769–777, Apr. 2008.

[2] V. Gutnik and A. Chandrakasan, “On-chip pico second time mea-

surement,” in Proc. Symp. VLSI Circuits Dig. Tech. Papers, 2000, pp.

52–53.

[3] M. A. Abas, G. Russell, and D. J. Kinniment, “Built-in time measure-

mentcircuits—A comparativedesignstudy,” IET Computers& Digital

Techn., vol. 1, no. 2, pp. 87–97, Mar. 2007.

[4] P. J. Restle, R. L. Franch, N. K. James, W. V. Huott, T. M. Skergan,

S. C. Wilson, N. S. Schwartz, and J. G. Clabes, “Timing uncertainty

measurements on the power5 microprocessor,” in Proc. ISSCC Dig.

Tech. Papers, 2004, pp. 292–293.

[5] A.H.ChanandG.W.Roberts,“Ajittercharacterizationsystemusinga

component-invariant vernier delay line,” IEEE Trans. Very Large Scale

Integr. (VLSI) Syst., vol. 12, no. 1, pp. 79–95, Jan. 2004.

[6] K. A. Jenkins, A. P. Jose, Z. Xu, and K. L. Shepard, “On-chip circuit

for measuring period jitter and skew of clock distribution networks,” in

Proc. IEEE CICC Dig. Tech. Papers, 2007, pp. 157–160.

[7] J. Doernberg, H.-S. Lee, and D. A. Hodges, “Full-speed testing of a/d

converters,” IEEE J. Solid-State Circuits, vol. 19, no. 6, pp. 820–827,

Dec. 1984.

[8] L.-M. Lee, D. Weinlader, and C.-K. K. Yang, “A sub-10-ps multiphase

sampling system using redundancy,” IEEE J. Solid-State Circuits, vol.

41, no. 1, pp. 265–273, Sep. 2006.

[9] T. A. Knotts, D. Chu, and J. Sommer, “A 500 MHz time digitizer ic

with 15.625 ps resolution,” in Proc. ISSCC Dig. Tech. Papers, 1994,

pp. 58–59.

[10] D. Weinlader, H. Ron, C.-K. K. Yang, and M. Horowitz, “An eight

channel 35 gsample/s CMOS timing analyzer,” in Proc. ISSCC Dig.

Tech. Papers, 2000, pp. 170–171.

[11] D. Fick, N. Liu, Z. Foo, M. Fojtik, J.-S. Seo, D. Sylvester, and D.

Blaauw, “In situ delay-slack monitor for high-performance processors

using an all-digital self-calibrating 5 ps resolution time-to-digital con-

verter,” in ISSCC Dig. Tech. Papers, 2010, pp. 188–189.

[12] P. K. Das, B. Amrutur, J. Sridhar, and V. Visvanathan, “On-chip clock

network skew measurement,” in Proc. A-SSCC Dig. Tech. Papers,

2008, pp. 401–404.

[13] L. Gammaitoni, “Stochastic resonance and the dithering effect

in threshold physical systems,” Phys. Rev. E, vol. 52, no. 5, pp.

4691–4699, Nov. 1995.

Bharadwaj Amrutur (M’08) received the B.Tech.

degree in computer science and engineering from In-

dian Institute of Science, Bombay, India, in 1990 and

the M.S. and Ph.D. degrees in electrical engineering

fromStanfordUniversity, PaloAlto, CA,in 1994and

1999, respectively.

He has worked at Bell Labs, Agilent Labs and

Greenfield Networks. He is currently an Assistant

Professor in the Department of Electrical Commu-

nication Engineering at Indian Institute of Science,

Bangalore, India, working in the areas of VLSI

Circuits and Systems.

Pratap Kumar Das (S’08) received the B.Tech. de-

gree in instrumentation and electronics engineering,

from College of Engineering and Technology,

Bhubaneswar, India in 2005. Since 2005, he has

been pursuing the Ph.D. degree at the Department

of Electrical Communication Engineering, Indian

Institute of Science, Bangalore.

One of his papers was selected to be among

the Noteworthy Technical Papers in the Asian

Solid-State Circuit Conference, 2008. His research

interests include custom digital circuit design for

measuring and mitigating variability in deep submicron CMOS technologies.

Rajath

B. E. degree in Electronics and Communication

Engineering, from R. V. College of Engineering,

Bangalore, India in 2007. Since 2007, he has been

working toward the Ph.D. degree at the Department

of Electrical Communication Engineering, Indian

Institute of Science, Bangalore.

His research interests include VLSI Circuits and

Systems, Mathematical modeling and Communica-

tion systems.

Vasudevamurthy

(S’09) receivedthe

#### View other sources

#### Hide other sources

- Available from ernet.in
- Available from Rajath Vasudevamurthy · May 20, 2014
- Available from Rajath Vasudevamurthy · May 20, 2014