Page 1

Domain Extender for Collision Resistant Hash Functions: Improving

Upon Merkle-Damg˚ ard Iteration

Palash Sarkar

Cryptology Research Group

Applied Statistics Unit

Indian Statistical Institute

203, B.T. Road, Kolkata

India 700108

palash@isical.ac.in

Abstract

We study the problem of securely extending the domain of a collision resistant compression function.

A new construction based on directed acyclic graphs is described. This generalizes the usual iterated

hashing constructions. Our main contribution is to introduce a new technique for hashing arbitrary

length strings. Combined with DAG based hashing, this technique gives a new hashing algorithm. The

amount of padding and the number of invocations of the compression function required by the new

algorithm is smaller than the general Merkle-Damg˚ ard algorithm. Lastly, we describe the design of a

new parallel hash algorithm.

Keywords : hash function, compression function, composition principle, collision resistance, directed

acyclic graph.

1 Introduction

Hash functions are a basic cryptographic primitive and are used extensively in digital signature protocols.

For such applications, a hash function must satisfy certain necessary properties including collision resistance

and pre-image resistance. Collision resistance implies that it should be computationally intractable to find

two elements in the domain which are mapped to the same element in the range. On the other hand,

pre-image resistance means that given an element of the range, it should be computationally intractable

to find its pre-image.

Construction of collision resistant and pre-image resistant hash functions are of both practical and

theoretical interest. Most practical hash functions are designed from scratch. The advantage of designing

a hash function from scratch is that one can use simple logical/arithmetic operations to design the algorithm

and hence achieve very high speeds. The disadvantage is that we obtain no proof of collision resistance.

Hence a user has to assume that the function is collision resistant. A well accepted intuition in this area is

that it is more plausible to assume a function to be collision resistant when the domain is fixed (and small)

rather than when it is infinite (or very large). A fixed domain function which is assumed to be collision

resistant is often called a compression function.

For practical use, it is required to hash messages of arbitrary lengths. Hence one must look for methods

which extend the domain of a compression function in a “secure” manner, i.e., the extended domain hash

function is collision resistant provided the compression function is collision resistant. Any method which

achieves this is often called a composition principle.

1

Page 2

Composition principles based on iterated applications of the compression function are known and these

are called variants of the Merkle-Damg˚ ard algorithm [2, 4]. The most general of these algorithms can

hash arbitrarily long messages and assumes the compression function to be only collision resistant. Other

variants can hash messages of a maximum possible length or assumes the compression function to be

both collision resistant and one-way. See Section 3 for a detailed discussion of several variants of the

Merkle-Damg˚ ard algorithm.

Our Contributions: In this paper, we are concerned with the problem of constructing a hash function

which can hash arbitrarily long messages and which can be proved to be collision resistant under the

assumption that the compression function is collision resistant. To justify the non-triviality of the problem

we describe a construction which can be proved to be secure if the compression function is both collision

resistant and one-way while it is insecure if the compression function is only collision resistant.

The first step in our construction is to consider a very general class of domain extending algorithms.

The structure of any algorithm in the class that we consider can be described using a directed acyclic graph

(DAG). In Section 5, we provide a construction of a secure domain extending algorithm using an arbitrary

DAG. The Merkle-Damg˚ ard algorithm uses a dipath and is a special case of DAG based algorithms.

Our main contribution (in Section 6) is to provide a solution to the problem of hashing arbitrary length

strings for DAG based algorithms. Our algorithm improves upon the (general) Merkle-Damg˚ ard algorithm

both in terms of padding length and number of invocations. Our construction can be proved to be collision

resistant under the assumption that the compression function is only collision resistant.

In Section 8, we provide some concrete examples of hashing structures and show that these can be

combined nicely to design a parallel hash function. We note, however, that we do not provide a detailed

specification of an actual hash function. Such a specification will necessarily involve many practical and

implementation issues which are not really within the scope of the current work.

A theoretical justification of our work is provided by the fact that our results improve upon a fifteen

year old classical work. Since our work improves upon the Merkle-Damg˚ ard algorithm, a natural question

is whether further improvements are possible. This naturally leads to the problem of obtaining non-trivial

lower bounds (and optimal algorithms) on padding lengths and number of invocations. These problems

can provide motivation for future research.

2 Preliminaries

We write |x| for the length of a string and x1||x2 for the concatenation of two strings x1 and x2. The

reverse of the string x will be denoted by xr. By an (n,m) function we will mean a function which maps

{0,1}nto {0,1}m. All logarithms in the paper are in base two.

For n > m, let h be an (n,m) function. Two n-bit strings x and x?in X are said to collide for h, if

x ?= x?but h(x) = h(x?). A hash function h : X → Y is said to be collision resistant if it is computationally

intractable to find collisions for h. A formal definition of this concept requires the consideration of a family

of functions (see [2, 5]).

In this paper, we are interested in “securely” extending the domain of a hash function. More precisely,

given an (n,m) function h : {0,1}n→ {0,1}m, with n > m+1, we construct a function h∞: ∪i≥1{0,1}i→

{0,1}m, such that one can prove the following: Given any collision for h∞, it is possible to obtain a

collision for h. The last statement is formalized in terms of a Turing reduction between two suitably

defined problems (see below). The advantage of this method is that we only prove a reduction and at no

point are we required to use a formal definition of collision resistance. This approach has been previously

used in the study of hash functions [6].

2

Page 3

We now turn to the task of defining our approach to reducibilities between different problems related

to the property of collision resistance. Consider the following problem as defined in [6].

Problem

Instance

Find

:

:

:

Collision Col(n,m)

An (n,m) hash function h.

x,x?∈ {0,1}nsuch that x ?= x?and h(x) = h(x?).

By an (?,q) (probabilistic) algorithm for Collision we mean an algorithm which invokes the hash function

h at most q times and solves Col(n,m) with probability of success at least ?.

The domain of h is the set of all n-bit strings. We would like to extend the domain to the set of all

nonempty binary strings, i.e., to construct a function h∞: ∪i≥1{0,1}i→ {0,1}m. We would like to relate

the difficulty of finding collisions for h∞to that of finding collisions for h. Thus, we consider the following

problem.

Problem

Instance

Find

:

:

:

Arbitrary length collision ALC(n,m,L)

An (n,m) hash function h and an integer L ≥ 1.

x,x?∈ ∪L

i=1{0,1}isuch that x ?= x?and h∞(x) = h∞(x?).

By an (?,q,L) (probabilistic) algorithm A for Arbitrary length collision we will mean an algorithm that

makes at most q invocations of the function h and solves ALC(n,m,L) with probability of success at least

?.

Later we show Turing reductions from Collision to Arbitrary Length Collision. Informally, this means

that given oracle access to an algorithm for solving ALC(n,m,L) for h∞it is possible to construct an

algorithm to solve Col(n,m) for h. These will show that our constructions preserve the intractibility of

finding collisions.

Pre-image resistance:

means that given y ∈ {0,1}m, it is computationally infeasible to find an x, such that f(x) = y. Pre-image

resistance (or one-wayness) is a crucially important property on its own. On the other hand, this property

is sometimes used to prove security of domain extending techniques for collision resistant hash functions.

Suppose the domain of an (n,m) hash function h is extended to obtain the hash function H(). For certain

constructions [2], one can show that h∞is collision resistant if h is both collision resistant and one-way.

We would like to emphasize that this is not the approach we will take in this paper. In our constructions,

we will assume h to be only collision resistant.

This is an important property for cryptographic hash functions. Informally, this

3Iterated Hashing

In this section, we briefly review iterative techniques for extending the domain of a collision resistant

compression function. These techniques are attributed to [4, 2] and are commonly called the Merkle-

Damg˚ ard constructions.

Let h be an (n,m) compression function and IV be an m-bit string. Each of the domain extending

methods described below use IV and h to construct a new function which can hash “long” strings to obtain

m-bit digest. The IV can be chosen randomly, but once chosen it cannot be changed and becomes part of

the specification for the extended domain hash function.

3

Page 4

3.1 Construction I: Basic Iteration

We define a hash function H(I)whose domain consists of all binary strings whose length is a multiple of

(n − m). Let x be a message whose length is i(n − m) for some i ≥ 1. We write x = x1||···||xi, where

each xj is a string of length (n − m). Define z1= h(IV||x1) and for j > 1, define zj= h(zj−1||xj). The

digest of x under H(I)is defined to be zi, i.e., H(I)= zi.

The function H(I)can be proved to be collision resistant. Briefly, the argument proceeds as follows.

Suppose x and x?are two strings such that x ?= x?and H(I)(x) = H(I)(x?). If we have |x| = |x?|, then an

easy backward induction shows that there must be a collision for the function h. On the other hand, if

|x| ?= |x?|, then it can be argued that the collision for H(I)either leads to a collision for h or a pre-image

of IV under h. Thus, if we assume that h is both collision resistant and pre-image resistant, then H(I)is

collision resistant.

3.2Construction II: General Construction

Our description of the general version (which appears in [2]) is from [7] for the case n − m > 1. (The case

n − m = 1 is a little more complicated. We do not mention it here since we will not consider such values

of n and m for our constructions.)

Let H(II)be the extended domain hash function which is to be defined. Let x be a message to be

hashed and we have to define the digest H(II)(x). Write x = x1||x2||...||xk, where |x1| = |x2| = ··· =

|xk−1| = n − m − 1 and |xk| = n − m − 1 − d with 0 ≤ d ≤ n − m − 2. For 1 ≤ i ≤ k − 1, let yi= xi;

yk= xk||0dand yk+1is the (n − m − 1)-bit binary representation of d. Define z1= h(IV||0||y1) and for

1 ≤ i ≤ k, define zi+1= h(zi||1||yi+1). The digest of x under H(II)is zk+1, i.e., H(II)(x) = zk+1.

Note that the domain consists of all possible binary strings, i.e., there is no length restriction on the

input message x. It can be shown that H(II)is collision resistant assuming h to be only collision resistant.

(See [7] for a proof.)

3.3Construction III: SHA Family Construction

The specification of the SHA family of constructions uses a variant of the iterative hashing technique. We

denote this variant by H(III).

Let x be the message to be hashed. First we form the string: pad(x) = x||1||0k||binc(|x|), where c is a

constant such that c < n−m, binc(|x|) is the c-bit binary representation of x and k is the least non-negative

integer such that |x| + 1 + k ≡ (n − m − c) mod (n − m), or equivalently x + c + 1 + k ≡ 0 mod (n − m).

The length of pad(x) is equal to l(n − m) for some l ≥ 1. (For SHA-256, n = 768, m = 256 and c = 64.)

The message digest is defined to be H(III)(x) = H(I)(pad(x)).

This construction can only handle messages of lengths less than 2c. Putting c = 64 (as in SHA-256) is

usually sufficient for all practical purposes. The maximum amount of padding is n−m which is a constant,

i.e., independent of the message length.

3.4Construction IV: Another Length Bounded Construction

We define a function H(IV)which like H(III)can also hash all binary strings of a maximum possible length.

Let the message be x. Append the minimum number of zeros to x so as to make the length a multiple

of (n−m). Now divide x into l blocks x0,...,xl−1of lengths (n−m) bits each. Define y0= h(IV||x0) and

for 1 ≤ i ≤ l − 1, define yi= h(yi−1||xi). Finally define z = h(yl−1||w), where w is the (n − m)-bit binary

4

Page 5

Table 1: Comparison of features of different constructions for a message x.

Cons.

I

domain sz.

infinite

length res.

|x| = i(n − m),

i ≥ 1

none

padding

none

# invoc.

|x|

n−m

assumption on h()

c.r. and

one-way

c.r. IIinfinite2n − m − 2

+

n−m−1

m

?

|x|

?

1 +

?

|x|

n−m−1

?

a ∈ {0,1}

1 +

?

, III2c,

|x| < 2c,

c < n − m

|x| < 2n−m

a +

|x|

n−m

?

?

?

c.r.

c < n − m

< 2n−m

IV2n − m − 1

|x|

n−m

c.r.

representation of |x|, i.e. w = binn−m(|x|). The digest of x is z. Clearly, this algorithm can be applied only

when the length of x is less than 2n−m. Again, this construction can be proved to be collision resistant

assuming h to be only collision resistant.

3.5 Role of IV

Each of the constructions described above use an m-bit string as an IV. The IV is essential in Construction I,

since in this construction we require h to be such that it is infeasible to find a pre-image of IV under h. On

the other hand, for Constructions II to IV, we can replace IV by the initial m bits of the message without

affecting the collision resistance of the extended domain hash function. If we do this, then in certain cases,

we can hash an extra m bits without increasing the number of invocations of h. In general, this is not

a significant gain, though it may become significant if we repeatedly hash short messages such as digital

certificates.

3.6Discussion

In Table 1, we compare the properties of the different constructions. For each construction, we provide

the size of the extended domain; the restriction on the lengths of messages to be hashed; the maximum

amount of padding; the maximum number of invocations of h() that are made while extending the domain;

and the security assumption made on h(). (In our count of the number of padded bits, we also include the

IV.) The first construction is proved to be collision resistant under the assumption that h() is both collision

resistant and one-way, while the other three constructions can be proved to be collision resistant under the

assumption that h() is only collision resistant. Construction II can handle arbitrary length strings, while

the Constructions III and IV can handle bounded length strings. On the other hand, Constructions III

and IV are more efficient than Construction II.

Question:

can handle arbitrary length strings, whose collision resistance is based only on the collision resistance of h

and which is more efficient than Construction II?

The theoretical question that now arises is whether it is possible to obtain a construction which

5

Page 6

4 Difficulty of Domain Extension

We would like to provide some evidence that it is non-trivial to obtain an answer to the question raised in

Section 3.6. It is often believed that “padding with the length at the end is sufficient to ensure collision

resistance”. Investigating such a claim in full generality is difficult. Instead, we consider a “natural”

extension of Construction III (the SHA family construction) to arbitrary length strings and show the

following two facts.

• It is correct if we assume h to be both collision resistant and one-way on IV but

• It is incorrect when we assume h to be only collision resistant.

For an integer i, let bin(i) denote the minimum length binary representation of i and for a binary string

x, let χ(x) denote the minimum length binary representation of the length of x, i.e., χ(x) = bin(|x|).

Construction V:

Construction III, define

We want to define a function H(V)which can handle arbitrary length strings. As in

pad(x)=

=

x||1||0k||bin(|x|)

x||1||0k||χ(x)

where k is the minimum non-negative integer which satisfies the equation |x|+|χ(x)|+1+k ≡ 0 mod (n−m).

This ensures that the length of pad(x) is equal to l(n − m) for some l ≥ 1 and hence we can apply the

iterative technique as in Construction III to compute the message digest. (The exact Construction III is

obtained by substituting binc(|x|) for bin(x).)

The digest of x under H(V)is defined to be H(I)(pad(x)), i.e., H(V)(x) = H(I)(pad(x)). Since we do

not put any bound on the length of bin(|x|), this construction can handle arbitrary length strings. Let us

now consider the correctness of Construction V.

Condition 1:

using an argument as in the case of Construction I, it is possible to show by a backward induction that a

collision for H(V)either provides a collision for h or a pre-image of IV under h.

Suppose h is both collision resistant and it is infeasible to find a pre-image of IV. Then,

Condition 2:

to be only collision resistant is not sufficient to show the correctness of Construction V. Let us consider the

meaning of this statement in more details. Suppose that there is some element in the range of h which has

a unique pre-image. Then the ability to find this pre-image (or even knowing it a priori) does not violate

the collision resistance of h. On the other hand, the knowledge of this pre-image can make it possible to

construct a collision for H(V). This is the approach that we take below.

Our first task is to choose a suitable collision resistant h. For this, we must assume that some function

h?() with suitable parameters is collision resistant, as otherwise the question is moot. (See [1] for a similar

situation in regard to universal one-way hash functions.)

Suppose h?() is an (n,m?) collision resistant function, with m?= m−1 and n−m = 2τ≥ 16. Further,

let IV and σ be arbitrary m-bit and (n−m)-bit strings respectively. Using h?, we define an (n,m) function h

for which it is infeasible to find collisions and for which IV||σ is the only pre-image of IV. Write IV = IV?||b,

Suppose that we want to assume h to be only collision resistant. We show that assuming h

6

Page 7

where |IV?| = m − 1 and b is a bit. For any n-bit string x, define

h(x)=

=

=

IV

IV?||(1 − b)

h?(x)||0

if x = IV||σ;

if h?(x) = IV?and x ?= IV||σ;

if h?(x) ?= IV?.

(1)

Clearly, IV has the unique pre-image IV||σ under h. On the other hand, any collision for h yields a collision

for h?. Hence, h is collision resistant if h?is collision resistant. (Note that h is not surjective, but that is

not relevant to the assumption that h is collision resistant.) Note that if we use Construction I to extend

the domain of h, then we get a function H(I)with the following property: H(I)(σi) = IV for all i ≥ 1, where

σidenotes i many repetitions of σ.

The conversion from h?to h works for any IV and σ. Choose IV to be an arbitrary m-bit string; and

σ = y?||1||0τ−1||1, where y?is an arbitrary string of length (n−m−1−τ). Then we can define the function

h as above. (The justification for choosing σ as above will become clear later.)

Consider the function H(V). This function is defined for any h and IV and hence also for h and IV

defined as above. We show that for such h and IV it is possible to exhibit a collision for H(V).

We define two strings x and x?in the following manner. String x is a “short” string, while string x?is

a “long” string. Define x = 0n−m−1−τand then χ(x) = ?log(n − m − 1 − τ)? = τ and hence

pad(x)=0n−m−1−τ||1||χ(x).

Note that in this case k = 0 and |pad(x)| = n − m.

We now define the string x?. First we set the length of x?by defining χ(x?) = 1||pad(x) and hence

|χ(x?)| = n − m + 1. This sets the length of x?to be 2n−m+ (n − m) + |x|. At this point, we know

pad(x?) = x?||1||0τ−1||χ(x?). This sets the length of pad(x?) to be 2n−m+ 3(n − m). We write x?= z?||y?

where |z?| = 2n−m+n−m. (Note |y?| = |x|, and we could, if we like, choose y?= x.) Recall σ = y?||1||0τ−1||1

and is of length (n − m). We define z?to be i many repetitions of σ, i.e., z?= σi, where i = 1 + 2n−m−τ.

Thus, we can write

pad(x?) = σ2+2n−m−τ||pad(x)

i.e., 2 + 2n−m−τrepetitions of σ followed by pad(x). Now,

H(V)(x?)=

H(I)(pad(x?))

=

H(I)(σ2+2n−m−τ||pad(x))

=

h(H(I)(σ2+2n−m−τ),pad(x))

=

h(IV,pad(x))

=

H(I)(pad(x))

=

H(V)(x).

Clearly, x ?= x?and hence we obtain a collision for H(V). Thus, H(V)is not collision resistant, even though

h is collision resistant. In fact, in the proof we have used the fact that IV has a unique and known pre-image

under h.

In view of this, we consider the problem of extending the domain of a collision resistant hash function

to be a non-trivial problem.

5DAG Hashing

So far, we have considered iterated hashing. Our main task will be to provide a new construction for

securely extending the domain of a collision resistant hash function. We actually do this for a general class

7

Page 8

of hashing algorithms whose structure can be described using a directed acyclic graph (DAG).

A DAG D is defined as D = (V,A) where V is a finite non empty set of nodes and A is a set of arcs

such that D contains no directed cycles. For any node v of D, we will denote by Γ(v) (resp. ∆(v)) the

set of all arcs coming into (resp. going out of) v. It is well known (and easy to prove) that any DAG

contains at least one node of indegree zero and at least one node of outdegree zero. We make the following

definition.

Definition 1 Let D = (V,A) be a DAG. A node with indegree zero will be called an exposed node; a node

with outdegree zero will be called an output node and all other nodes will be called internal nodes.

If v is an exposed node and u is an output node, we have Γ(v) = ∆(u) = ∅. Given a DAG D, let l(D) be the

maximum number of nodes on any path from an exposed node to an output node (counting both the start

and the end nodes). We will call l(D) to be the depth of D. To each node v, of D we assign a non negative

integer called its level in the following manner. For each output node v of D, set level(v) = l(D)−1; drop

all the output nodes from D to get a new DAG D1. For each output node of v of D1, set level(v) = l(D)−2;

again drop all the output nodes from D1to get a new DAG D2. Continue this process until all nodes of

D have been assigned level numbers. The level numbers of the nodes partition V into l disjoint subsets

S0,...,Sl−1, where l = l(D) and Si= {v : level(v) = i}. Note that all output nodes are at the same level,

but the exposed nodes can be at different levels. However, all nodes at level zero are necessarily exposed

nodes.

An assignment α on D = (V,A) is a function α : A → N which assigns a positive integer to each arc

of D. Let n and m be two positive integers with n > m and D be a DAG. An assignment α is said to be

proper with respect to (n,m,D) if the following condition holds.

For any node v of D, (a)?

(n,m,D) and any node v, we have µ(v) ≤ n. For any exposed node v, we have µ(v) = 0.

A structure is a tuple S = (n,m,D = (V,A),α) where α is a proper assignment on (n,m,D). By an

exposed or output node of a structure S we will mean an exposed or output node of the underlying DAG

D. Similarly, by the depth of a structure we will mean the depth of the underlying DAG.

e∈∆(v)α(e) = m and (b)?

e∈Γ(v)α(e). Thus, for a proper assignment α on

e∈Γ(v)α(e) ≤ n.

For any node v, we define the fan-in of v to be µ(v) =?

5.1 Construction

Given a structure S and an (n,m) compression function h, we can define a hash function hSin the following

manner. The hash function takes as input a message x (whose length we specify later) and produces as

output a digest y = h(x). The basic idea is to invoke the hash function h for each node v of D. The

function h takes n bits as input and produces m bits as output. To ensure this we have to parse (or

format) the message x properly. We first describe this formatting procedure. For any node v, the input

to v will be written as z(v) and the output of v will be written as y(v). The input z(v) is formed by

concatenating a part of the message x and some portions of the outputs of previous invocations of h as is

made precise below. The substring of the message which is provided as input to v is denoted by x(v) and

is of length |x(v)| = n − µ(v). As a notational convenience, we will assume V = {v1,...,vt} and write

xi= x(vi), zi= z(vi) and yi= y(vi).

We associate a non empty string β(e) of length at most m to each arc e of D in the following manner.

Let ∆(vi) = {ei,1,...,ei,ki} and write yi = yi,1||...||yi,ki, where |yi,j| = α(ei,j) for 1 ≤ j ≤ ki. Then

β(ei,j) = yi,j. For any node vi write Γ(vi) = {ei,1,...,ei,ri}. Then the input zi to vi is formed by

concatenating xiand β(ei,1),...,β(ei,ri), i.e., zi= xi||β(ei,1)||...||β(ei,ri). For any exposed node v, we

8

Page 9

have Γ(v) = ∅ and consequently z(v) = x(v) and |x(v)| = n. Given a message x, the computation of hS(x)

is described as follows.

Computation of hS(x)

1. For i = 0 to l(D) − 1 do

2. For vj∈ Si

3.set yj= h(zj).

4.End do.

5. End do.

6. z = λ (the empty string).

7. For v ∈ Sl(D)−1set z = z||y(v).

8. output z.

We say that the hash function hSis associated to the structure S and the compression function h.

Remark : The loop in Steps 2 to 4 involves the invocation of h for each node in Si. These invocations can

be carried out in parallel and hence a parallel execution of the algorithm will require exactly l(D) parallel

rounds. Thus, the depth of a struture determines the number of parallel rounds required to compute the

output of the associated hash function.

5.2Properties of hS

The following result describes the lengths of the input and output strings of the hash function hS.

Proposition 2 Let S = (n,m,D = (V,A),α) be a structure and h : {0,1}n→ {0,1}mbe a compression

function. Then hS: {0,1}N→ {0,1}Mwhere N = t(n − m) + sm and M = sm, where t = |V | and s is

the number of output nodes in D.

Proof. The outputs of all the output nodes are concatenated and provided as output of hS. The length

of the output of each node is m bits, hence the length of the output of hSis sm bits.

The calculation of the input size is as follows. There are t nodes in D. The function h is invoked once

for each of these nodes and hence h is invoked a total of t times. Each invocation of h requires an n-bit

input. Thus, a total of tn bits are required as input to all the invocations. An input to an invocation

of h either comes directly from the message x or is a part of the intermediate output of some previous

invocation of h. There are (t − s) intermediate outputs which provide a total of (t − s)m bits. Hence the

message x has to provide a total of exactly tn − (t − s)m = t(n − m) + sm bits.

The next result shows that the construction described above preserves the property of collision resis-

tance.

Theorem 3 Let hSbe a hash function constructed from a structure S = (n,m,D,α) and a compression

function h described as above. Then, it is possible to find a collision for hSif and only if it is possible to

find a collision for h.

Proof. If: We have to show that any collision for h can be extended to a collision for hS. Let x1and x?

be distinct n-bit strings which collide for h. Let v be an exposed node of the structure S. We now define

two strings x and x?in the domain of hSsuch that x ?= x?and hS(x) = hS(x?). Note that to define x

and x?it is enough to define the corresponding inputs x(u) and x?(u) to each node u of S. We do this as

follows: Set x(v) = x1, x?(v) = x?

binary string of appropriate length. Then it is clear that x ?= x?. Moreover, hS(x) = hS(x?) since the

1

1and for any u ?= v, set x(u) and x?(u) both to be equal to an arbitrary

9

Page 10

outputs of the invocation of h at node v are equal and the inputs to all other nodes are equal. Thus, x

and x?provide a collision for h.

Only If: For 0 ≤ i ≤ l(D) − 1, we define three sequences of sets ZListi,XListiand YListi, where

XListi= {x(v) : level(v) = i}, ZListi= {z(v) : level(v) = i} and YListi= {y(v) : level(v) = i}.

Note that the message x can be written as a concatenation (in an appropriate order) of the strings in XListi

for 0 ≤ i ≤ l(D) − 1.

For the proof, assume that there are two messages x and x?such that x ?= x?but hS(x) = hS(x?).

We show that it is possible to find a collision for h. In the following, we will use primed and unprimed

notations to denote quantities corresponding to x?and x respectively.

Our proof technique is the following. Assume that there is no collision for any of the invocations of

h. We show that this implies x = x?which contradicts the hypothesis that x ?= x?. Hence, there must be

a collision for some invocation of h. We now turn to the proof of the fact that if there is no collision for

h, then x = x?. This is proved by backward induction on i. More precisely, we show that if there is no

collision for h, then for each i, we have XListi= XList?

proof.

We are given that hS(x) = hS(x?). This implies that YListl(D)−1(x) = YList?

for each v ∈ Sl(D)−1, we have h(z(v)) = y(v) = y?(v) = h(z?(v)). Since there is no collision for h, we must

have z(v) = z?(v) and consequently ZListl(D)−1= ZList?

we have x(v) = x?(v) and for each u ∈ Sl(D)−2we have y(u) = y?(u). Hence XListl(D)−1= XList?

YListl(D)−2= YList?

For the induction step assume that we have shown XListi+1 = XList?

i ≥ k + 1. Then using an argument similar to the one given above it follows that XListi = XList?

YListi−1= YList?

previous argument shows that XList0= XList?

i. Consequently, x = x?. We now turn to the actual

l(D)−1(x?) and consequently

l(D)−1. This in turn implies that for each v ∈ Sl(D)−1

l(D)−1and

l(D)−2.

i+1and YListi = YList?

ifor all

iand

i−1. This shows that XListi= XList?

ifor 1 ≤ i ≤ l(D)−1. Now one more application of the

0. Hence XListi= XList?

ifor all 0 ≤ i ≤ l(D)−1 as desired.

6 Hashing Arbitrary Length Strings

The hash function hScan handle only strings of one particular length. We would like to obtain a function

which can handle strings of any length. Techniques to handle arbitrary length strings have been introduced

before by Damg˚ ard [2] (see Construction II in Section 3.2) for the special case of structures where the

underlying DAG is a directed path. It does not seem to be easy to adapt the technique of [2] to the more

general case of DAG that we consider here. Thus, we present a new method for handling arbitrary length

strings, which is also of independent interest. To describe the construction of hash function which can

handle arbitrary length strings we need to introduce an infinite family of DAGs. To keep the description

reasonably simple, we assume that each DAG in the family has a single output node. The precise definition

of the family that we consider is given below.

Let {Dk}k≥1be a family of DAGs where Dk = (Vk,Ak) is such that |Vk| = k and Dk has exactly

one output node. Given positive integers n and m with n > m, a family of structures F is defined as

F = {Sk}k≥1where Sk= (n,m,Dk,αk), where αkis a proper assignment on Dk. Given a compression

function h : {0,1}n→ {0,1}m, and a family of structures F, we define a family of hash functions {hk}k≥1,

where hk= hSk. From Proposition 2, we have

hk: {0,1}k(n−m)+m→ {0,1}m.

10

Page 11

Note that h1= h. From Theorem 3, we know that the ability to find a collision for any hkimplies the

ability to find a collision for h.

We want to define a hash function which can handle strings of any length. Each hkcan handle only

fixed length strings. More precisely, h1 can handle strings of length n, h2 can handle strings of length

2n−m, h3can handle strings of length 3n−2m and so on. First we need to “fill the gaps” in the lengths.

For this we define a function h∗: ∪i≥1{0,1}i→ {0,1}min the following manner.

h∗(x)=

h1(x||0n−|x|)

=

hk+1(x||0(k+1)(n−m)+m−|x|)

Note that the amount of padding done to x in the definition of h∗is at most (n − 1) in the first case and

at most (n − m − 1) in the second case. The function h∗(x) is not collision resistant. For example, the

images of the strings 1 and 10n−1are same, since h∗(1) = h(10n−1) = h∗(10n−1). We modify the function

h∗(x) to a function h∞(x) : ∪i≥1{0,1}i→ {0,1}mwhich is collision resistant (assuming that h is collision

resistant). To do this we first need to introduce a length extracting function.

Given a binary string x, recall that χ(x) denotes the minimum length binary representation of the

length of x. For example, if x = 110001101010, then χ(x) = 1100, since the length of x is 12. The iterates

of χ() are defined as usual: χ0(x) = x and for i > 0, χi(x) = χ(χi−1(x)). The following result states some

simple properties of the function χ(). Recall that the reverse of a binary string y is denoted by yr.

if 1 ≤ |x| ≤ n;

if k(n − m) + m < |x| ≤ (k + 1)(n − m) + m.

?

(2)

Proposition 4 Let x be a binary string. Then

1. The first bit of y = χ(x) is 1 and hence the last bit of yris also 1.

2. χ(x) = x if and only if x = 1 or x = 10.

3. |χ(x)| = 1 + ?log|x|? = ?log(|x| + 1)?.

4. If |x| > 1, then there is a positive integer j, such that χj(x) = 10.

Remark : For the construction of h∞given below to work, there must exist a j such that |Xj| ≤ n − m.

If n−m = 1 and |x| > 1, then this cannot be achieved. Thus, henceforth we will assume n−m ≥ 2. From

a practical point of view, this is not really a constraint since all known practical compression functions

satisfy this condition.

Now we are in a position to define the function h∞. Recall that xrdenotes the reverse of the string x.

Let IV be an initialization vector, i.e., a string of length m.

Computation of h∞(x).

1. Define X0= x and for i > 0, define Xi= χi(X0) = χ(Xi−1).

2. Let j be the least positive integer such that |Xj| ≤ n − m.

3. Define Y0= h∗(IV||0||X0).

4. For 1 ≤ i ≤ j − 1, define Yi= h∗(Yi−1||1||Xi).

5. Yj= h∗(Yj−1||Xr

6. Output Yj.

j).

Remark : The value of j in the above algorithm will be more than one only if the length of the message is

greater than 2n−m. For practical compression functions (such as SHA, RIPEMD, etc.) the value of (n−m)

is at least 128. Thus, for all practical compression functions and practical sized messages the value of j

will be equal to one.

We next prove that h∞is collision resistant if h is collision resistant.

Theorem 5 If there is an (?,q,L)-algorithm to solve ALC(n,m,L) for h∞, then there is an (?,q + 2η)-

algorithm to solve Col(n,m) for h, where η is the number of invocations of h made by h∞in hashing a

message of length L.

11

Page 12

Proof. Given any message x, the computation of the digest involves several invocations of the function

h∗. At each stage, the function h∗in turn invokes hk on a suitably padded string. There are (j + 1)

invocations of h∗. Suppose that at the ith (0 ≤ i ≤ j) invocation of h∗, the function hkiis invoked.

Also denote the padded input to hkiby Wi. Thus, Y0 = h∗(IV||0||X0) = hk0(W0), for 1 ≤ i ≤ j − 1,

Yi= h∗(Yi−1||1||Xi) = hki(Wi) and Yj= h∗(Yj−1||Xr

and for 0 ≤ i ≤ j, |Yi| = m.

Assume h∞(x) = h∞(x?) and x ?= x?. We show that this implies that there is a collision for h. The

proof is by backward induction. We will use primed and unprimed notation to denote the quantities

corresponding to x and x?respectively.

By hypothesis, we have h∞(x) = Yj= Y?

j) = hkj(Wj). Further, we have |Wi| = ki(n−m)+m

j? = h∞(x?). From the definition of h∞we have

h∗(Yj−1||Xr

j) = hkj(Wj) = Yj= Y?

j? = hk?

j?(W?

j?) = h∗(Y?

j?−1||X?r

j?).

By definition of j and j?, we have |Xj|,|X?

definition of h∗it follows that kj= k?

for h1and hence for h (since h = h1) and we are done. On the other hand, if there is no collision for h, we

must have Wj= W?

j?| ≤ n − m and hence |Yj−1||Xj|,|Y?

j? = 1 and |Wj| = |W?

j?−1||X?

j? then we obtain a collision

j?| ≤ n. From the

j?| = n. If Wj?= W?

j?. Hence

Wj= Yj−1||Xr

jand X?r

j?−1and Xj= X?

j||0n−m−|Xj|= Y?

j? end with a 1, by the above condition we must have |Xr

j?. Now there are two cases to consider.

j? = χ(X?

j?−1. Also we have

hkj−1(Wj−1) = Yj−1= Y?

j?−1||X?r

j?||0n−m−|X?

j?|= W?

j?.

Since both the strings Xr

implies Yj−1= Y?

Case j = j?: We have χ(Xj−1) = Xj = X?

|W?

j| = |X?r

j?|. This

j?−1) and hence |Xj−1| = |X?

j?−1|. Thus, |Wj−1| =

j?−1| and consequently kj−1= k?

j?−1= hk?

j?−1(W?

j?−1).

Using Theorem 3, we obtain that either Wj−1= W?

are done and in the first case we obtain Wj−1= W?

Repeating the above argument for i = j − 2,...,1, we obtain that Wj−2= W?

W1= W?

and X1= X?

χ(X0) = X1= X?

Consequently, |X0| = |X?

hk0(W0) = Y0= Y?

j?−1or we obtain a collision for h. In the second case, we

j?−1and consequently Yj−2= Y?

j?−2and Xj−1= X?

j?−2, Wj−3= W?

j?−3, ..., Y0= Y?

j?−1.

j?−3, ...,

1and consequently Yj−3= Y?

1. Now we have

j?−3and Xj−2= X?

j?−2, Yj−4= Y?

j?−4and Xj−3= X?

0

1= χ(X?

0).

0| and so |W0| = |W?

0|. This forces k0= k?

0. Thus, we have

0= hk0(W?

0).

Again using Theorem 3, we have that either there is a collision for h or W0= W?

are done and in the second case, we have W0= W?

hypothesis. Hence there is a collision for h.

Case j ?= j?: Without loss of generality assume j?> j and j?−j = l > 0. Proceeding as in the above case,

we have Y0= Y?

|W0| = |W?

hk0(W0) = Y0= Y?

0. In the first case we

0= x?which contradicts the

0and hence x = X0= X?

land X1= X?

l|. This forces k0= k?

l+1. Again χ(X0) = X1= X?

l. Thus, we have

l+1= χ(X?

l) and hence |X0| = |X?

l| which implies

l= hk?

l(W?

l).

The string W0is formed by (possibly) padding 0’s to the end of IV||0||X0and the string W?

(possibly) padding 0’s to the end of Y?

hence W0?= W?

lis formed by

l−1||1||X?

l. Thus, W0and W?

ldiffer in the (m+1)th bit position and

l. Hence by Theorem 3 there must be a collision for h.

12

Page 13

Let A be an (?,q,L)-algorithm to solve ALC(n,m,L) for h∞. Then A is successful with probability ?

and in this case let (x,x?) be the output of A. Thus, |x|,|x?| ≤ L and so h∞invokes h at most η times for

hashing either x or x?. The algorithm B to solve Col(n,m) for h is as follows. B first executes A. If A

fails, then B also fails. If A succeeds and returns (x,x?), then B invokes h∞on both x and x?and “scans

backwards” until a collision for h is found. By the above discussion, if (x,x?) is a collision for h∞, then

with probability one, the backward scan will produce a collision for h. Thus, the success probability of

B is also ?. Further, the number of invocations of h made by B is found as follows: q times during the

execution of A and at most η times each on x and x?, giving a total of at most q + 2η invocations.

7 Comparison to Iterated Hashing

In this section, we perform a comparison of the new construction to the several variations of the Merkle-

Damg˚ ard constructions. Before getting into the details, we would like to point a few things.

• Our construction is more general in the sense that it works over an arbitrary DAG, whereas the

variations of the Merkle-Damg˚ ard algorithm works only with dipaths. Also, we would like to point

out that the mechanism in Merkle-Damg˚ ard algorithm for handling arbitrary length strings and the

associated argument does not carry over to the case of arbitrary DAGs.

• The detailed comparison that we present below is only to Construction II, since this is the algorithm

which can hash arbitrary length strings and assumes h to be only collision resistant.

• From a practical point of view, in general, we do not expect our algorithm to replace the Construc-

tion III. For most cryptographic purposes, computation of the hash function requires a very small

fraction of the total time. Hence, parallel hash computation algorithms (and consequently DAGs)

would be required only for special purpose applications. On the other hand, we believe that the

issue of obtaining an efficient parallel hash algorithm which can handle arbitrary length strings is of

significant theoretical interest.

7.1 Padding Efficiency

The function h∞performs some amount of padding to the string x before hashing it. We determine the

maximum amount of padding that is done and show that this is (asymptotically) less than the amount of

padding performed in Construction II. Given integer i, we define log∗(i) to be the least integer k such that

log(log(...(log(

? ???

k

|x|)...)) ≤ 1.

Note that the parameters n and m of the compression function h are independent of the message length

|x| and can be assumed to be constant in an asymptotic analysis.

Proposition 6 Let x be a binary string with |x| > n. Then the maximum amount of padding done to the

string x in the computation of h∞(x) is

n + j(n − m) + |χ(x)| + |χ2(x)| + ··· + |χj−1(x)|

where j is the minimum positive integer such that |χj(x)| ≤ n − m.

13

Page 14

Proof. The maximum amount of padding in Step 3 is m+1+(n−m−1). In Step 4, there is a loop; for

each value of i (1 ≤ i ≤ j−1) the maximum amount of padding is 1+|Xi|+(n−m−1) = (n−m)+|χi(x)|.

The padding in Step 5 is equal to (n − m). Adding up all these gives the required result.

The maximum amount of padding done to x in Construction II is 2n−m−2+

suming n and m to be constants, the amount of padding is O(|x|). On the other hand, assuming n and m to

be constants, the maximum amount of padding in our algorithm is bounded above by O((log∗|x|)(log|x|)).

Hence, in an asymptotic sense our padding scheme is more efficient than the Merkle-Damg˚ ard padding

scheme. The asymptotic inefficiency in the Merkle-Damg˚ ard construction arises due to the fact that one

bit of padding is done to each message block.

?

|x|

n−m−1

?

(see [7]). As-

7.2Invocation Efficiency

We compare the invocation efficiency of our algorithm to Construction II, i.e., we compare the number of

invocations of the compression function h for a message x made by Construction II and our algorithm.

We first compute the number of invocations of h made by our algorithm. The algorithm to compute

h∞invokes h∗exacty j + 1 times, i.e., for i = 0,...,j. Suppose as in the proof of Theorem 5 that the

ith invocation of h∗is made on the string Wi which is obtained by possibly padding 0s to IV||0||X0 if

i = 0; to Yi−1||1||Xiif 1 ≤ i ≤ j − 1; and to Yj−1||Xj if i = j. Then from Proposition 2, it follows that

|Wi| = (n − m)(ki− 1) + n. Now |Wi| = m + 1 + |Xi| + |αi| for 0 ≤ i ≤ j − 1 and |Wj| = m + |Xj| + |αj|,

where αis are the all zero strings which are used as pads to obtain the Wis. Further, |αi| ≤ n − m − 1 for

all 0 ≤ i ≤ j. Thus, we obtain ki=

the number of invocations of h made by hki. Note that kj= 1. Hence the total number of invocations of

h made in the computation of h∞is obtained by adding all the kis and is given in the following result.

?|Xi|+1

(n−m)

?

if 0 ≤ i ≤ j − 1; and ki=

?

|Xi|

(n−m)

?

if i = j. The value of kiis

Proposition 7 The total number B of invocations of h made in the computation of h∞is equal to

?|x| + 1

B

=

n − m

?

+

?|χ(x)| + 1

n − m

?

+ ··· +

?|χj−1(x)| + 1

n − m

?

+ 1

In Construction II, the number of invocations A of the compression function h is equal to A = 1+

On the other hand, the number B of invocations of h in our algorithm is given by Proposition 7. Note

that j ≤ log∗|x|. Using this fact and some simple algebraic simplification we obtain

|x|

?

|x|

n−m−1

?

.

A − B>

(n − m)(n − m − 1)−n − m + 1 + log∗|x|(n − m + 2 + log|x|)

0

n − m

>

for sufficiently large |x|. Thus, in an asymptotic sense, our algorithm is more efficient than the Merkle-

Damg˚ ard algorithm.

7.3Optimal Construction?

Consider the problem of secure domain extension to arbitrary length strings. Both Construction II and

our algorithm perform this task. We have shown that our algorithm improves upon the Merkle-Damg˚ ard

algorithm both in terms of reducing the amount of padding and the number of invocations. This suggests

the following two problems.

14

Page 15

Lower Bound: Let A be an algorithm which securely extends the domain of a compression function

h to arbitrary length strings. What is the minimum amount of padding and minimum number of

invocations of h that A has to make on an input x of length |x|?

Construction: Is there a construction which improves upon our algorithm?

At this point, we do not know the answer to either of these two question. In particular, for the first question,

we have not even been able to prove that the amount of padding cannot be constant (i.e. independent

of the length of x). On the other hand, for the second question, it might seem that a padding of length

proportional to log|x| might be sufficient. However, actually obtaining such a construction along with a

correctness proof does not seem to be easy. We believe that the resolution of these questions can form

tasks of future research and the answers will be important for the understanding of collision resistant hash

functions.

8Concrete Examples

In this section, we provide some examples of DAGs which can be used to extend the domain of a collision

resistant compression function. To do this it will be easier to define a notion of composition of structures

in the following manner.

Let S1= (n,m,D1= (V1,A1),α1) and S2= (n,m,D2= (V2,A2),α2) be two structures such that the

number of output nodes of D1is at most equal to the number of exposed nodes of D2. Let {u1,...,ur} be

the output nodes of D1and {v1,...,vs} (r ≤ s) be the exposed nodes of D2. Define a DAG D = (V,A),

where V = V1∪ V2and A = A1∪ A2∪ {(u1,v1),...,(ur,vr)}. Define a proper assignment α on D in the

following manner: α(e) = αi(e) if e ∈ Ai, i = 1,2 and α(e) = m otherwise. We define S = S1• S2to be

the partial composition of S1and S2where S = (n,m,D,α). In case r = s, i.e., the number of output

nodes of D1is equal to the number of exposed nodes of D2we will say that S is the total composition or

simply the composition of S1and S2. Also we will denote the total composition by the symbol ◦. Note

that ◦ is an associative operation while • is not and neither of the two operations are commutative.

From now on we will explicitly write a structure as S = (n,m,r1,r2,D,α) where r1(resp. r2) is the

number of exposed (resp. output) nodes of D. Thus, we can compose S1 = (n,m,r1,r2,D1,α1) and

S2= (n,m,r2,r3,D2,α2) to obtain S = S1◦ S2= (n,m,r1,r3,D,α). Let hS1,hS2and hSbe the hash

functions associated with the structures S1,S2and S respectively. Then hS1is an (t1(n−m)+r2m,r2m)

function, hS2is an (t2(n − m) + r3m,r3m) function and hSis an ((t1+ t2)(n − m) + r3m,r3m) function

where t1and t2are the numbers of nodes in D1and D2respectively.

We now provide some examples of structures. In each of the cases below we assume the existence of a

suitable (n,m) compression function h.

Example 1 (isolated nodes): For i ≥ 1, define Ki= (n,m,i,i,D = ({1,...,i},∅),α) to be the structure

corresponding to the digraph consisting of i nodes and no arcs. Hence each node is both an exposed and

an output node. The depth of Kiis one. The associated hash function hKiis an (in,im) function.

Example 2 (dipath): For r ≥ 1, define P(r)to be the directed path on r nodes and α assigns m to each

arc of P(r). This defines a structure P(r)= (n,m,1,1,P(r),α). The depth of P(r)is r. The associated

hash function hP(r) is an (r(n − m) + m,m) function. A variation of this structure (which includes an

initialization vector) is used in the Merkle-Damg˚ ard construction [2, 4].

Example 3 (parallel dipaths): For r,q ≥ 1, define P(r)

corresponding structure is denoted by P(r)

q

to be the union of q copies of P(r)and the

= (n,m,q,q,P(r),α), where α again assigns m to each arc of

q

15

Page 16

P(r)

(rq(n − m) + qm,qm) function.

Example 4 (contracting binary tree): For t ≥ 1, let Ttbe the binary tree with t levels and 2t− 1

nodes defined by Tt= ({1,...,2t− 1},{(i,?i/2?) : 2 ≤ i ≤ 2t− 1}). We define an assignment α which

assigns m to each arc of Tt. Then the fan-in of any non exposed node is 2m and since α is proper we must

have n ≥ 2m. We denote the corresponding structure by Tt= (n,m,2t−1,1,Tt,α). The depth of Ttis t.

The associated hash function hTtis a ((2t− 1)(n − m) + m,m) hash function.

Example 5 (expanding binary tree): For t ≥ 1, let Itbe the inverted binary tree of t levels: It=

({1,...,2t− 1},{(i,2i),(i,2i + 1) : 1 ≤ i ≤ 2t−1− 1}). The assignment α assigns (m/2) to each arc of It.

We denote the corresponding structure by It= (n,m,1,2t−1,It,α). The depth of Itis t. The associated

hash function hItis a ((2t− 1)(n − m) + 2t−1m,2t−1m) function.

Example 6 (parallel structure): For t ≥ 1 and r ≥ 0, define S(r)

exposed and output nodes of S(r)

is a ((2t(r + 2) − 2)(n − m) + m,m) function.

Example 7 (incremental parallel structure): For r ≥ 0, t ≥ 1 and 1 ≤ s ≤ 2t−1define the structure

S(r,s)

The numbers of exposed and output nodes of S(r,s)

function hS(r,s)

Remark : We note that the basic idea behind Example 7 is already present in Damg˚ ard [2]. However, we

provide much more details.

q . The depth of P(r)

q

is r and also note that Ki = P(1)

i. The associated hash function hP(r)

q

is an

t

= It◦ P(r)

2t−1◦ Tt. The numbers of

t

are both one and its depth is 2t+r. The associated hash function hS(r)

t

t

= (It◦ P(r)

2t−1) ◦ (Ks• Tt).

(3)

t

are both one and its depth is 2t + r + 1. The hash

is an ((2t+1+ r2t−1+ s − 2)(n − m) + m,m) function.

t

associated to the structure S(r,s)

t

8.1 A Parallelizable Hash Algorithm

We build on Example 7 above to obtain a parallel algorithm for extending the domain of a collision resistant

hash function. The algorithm will use 2t−1processors and its structure will be S(r,s)

are determined by the length of the message x in the following manner. Define λ(t) = (2t+1−2)(n−m)+m

and δ(t) = 2t−1(n − m). Let x be the string which is to be hashed.

1. Write |x| − λ(t) = rδ(t) + γ1, where γ1is a unique integer from the set {1,...,δ(t)}.

2. Write γ1= γ2(n − m) + γ3, where γ3is a unique integer from the set {1,...,n − m}.

3. Set x = x||0n−m−γ3. Note that the amount of padding is at most (n − m − 1).

Note that 1 ≤ γ2< 2t. The above steps define the parameters r,γ1,γ2and γ3. We further define s = γ2+1.

For t ≥ 1, we define the function g∗

g∗

t

t

for some r and s which

t: ∪i≥λ(t){0,1}i→ {0,1}mby

t(x) = hS(r,s)

(x||0n−m−γ3) (4)

where hS(r,s)

done to the string x is at most n − m − 1.

Remark : One constraint for using the structure Tt is that the (n,m) compression function h must

satisfy n ≥ 2m. The structure S(r,s)

S(r,s)

compression functions satisfy this condition.

t

is the hash function associated with the structure S(r,s)

t

. Note that the amount of padding

t

contains the structure Ttand hence this constraint also holds for

. However, from a practical point of view this is not really a constraint, since all known practical

t

16

Page 17

We now define H∗

we need to define a function g0: ∪2n−m−1

n < |x| < 2n − m and w = x||02n−m−|x|. Write w = w1||w2where |w1| = n and |w2| = n − m. We define

g0(x) = h(h(w1)||w2).

H∗

=

h(x||0n−|x|)

=

g0(x) if n < |x| < 2n − m;

=

g∗

if λ(i) ≤ |x| < λ(i + 1) and 1 ≤ i < t;

=

g∗

if |x| ≥ λ(t).

t: ∪i≥1{0,1}i→ {0,1}min the following manner. First, for the sake of convenience,

i=n+1

{0,1}i→ {0,1}min the following manner. Let x be a string with

t(x) if |x| ≤ n;

i(x)

t(x)

(5)

The desired function H∞

Computation of H∞

1. Let X0= x and for i ≥ 1, Xi= χ(Xi−1) = χi(X0).

2. Let j be the least positive integer such that |Xj| ≤ n − m.

3. Set Y0= H∗

4. For i = 1 to j − 1 set Yi= H∗

5. Yj= H∗

6. Output Yj.

t

is obtained from the function H∗

tusing the construction of Section 6.

t(x)

t(IV||0||X0).

t(Yi−1||1||Xi).

t(Yj−1||Xr

j).

The collision resistance of H∞

following result.

t

is proved in a manner similar to that of Theorem 5 and this gives us the

Theorem 8 Let H∞

for h.

t

be the hash function defined as above. Then, a collision for H∞

t

yields a collision

9 Conclusion

We have considered the problem of securely extending the domain of a collision resistant hash function

using an arbitrary DAG. A new efficient construction has been presented. This construction improves upon

the general Merkle-Damg˚ ard algorithm both in the amount of padded bits and the number of invocations

of the compression function. The proof of collision resistance of our construction requires the compression

function to only collision resistant (one-wayness is not used in the proof). In this paper, we have entirely

concentrated on the property of collision resistance. In fact, all the domain extending techniques considered

here also preserve the property of pre-image resistance.

Acknowledgement:

error was discovered while discussing the paper with several other people. We would like to thank Rana

Barua, Mridul Nandi and Bimal Roy for this.

The construction in Section 6 was incorrect in an earlier version of the paper. The

References

[1] M. Bellare and P. Rogaway, Collision-resistant hashing: towards making UOWHFs practical, in:

Proceedings of Crypto 1997, Lecture Notes in Computer Science, volume 1294, Springer, 1997, pp.

470-484.

17

Page 18

[2] I. B. Damg˚ ard, A design principle for hash functions, in: Proceedings of Crypto 1989, Lecture Notes

in Computer Science, volume 435, Springer, 1990, pp. 416-427.

[3] W. Diffie and Martin E. Hellman. New Directions in Cryptography. IEEE Transactions on Information

Theory, volume IT-22, number 6, pages 644–654, year 1976.

[4] R. C. Merkle, One way hash functions and DES, in: Proceedings of Crypto 1989, Lecture Notes in

Computer Science, volume 435, Springer, 1990, pp. 428-446.

[5] B. Preneel, The state of cryptographic hash functions, in: Lectures on Data Security: Modern Cryp-

tology in Theory and Practice, Lecture Notes in Computer Science, volume 1561, Springer 1999, pp.

158-182.

[6] D. R. Stinson, Some observations on the theory of cryptographic hash functions, Designs, Codes and

Cryptography, to appear.

[7] D. R. Stinson. Cryptography: Theory and Practice, CRC Press, second edition, 2002.

18