Page 1

Domain Extender for Collision Resistant Hash Functions: Improving

Upon Merkle-Damg˚ ard Iteration

Palash Sarkar

Cryptology Research Group

Applied Statistics Unit

Indian Statistical Institute

203, B.T. Road, Kolkata

India 700108

palash@isical.ac.in

Abstract

We study the problem of securely extending the domain of a collision resistant compression function.

A new construction based on directed acyclic graphs is described. This generalizes the usual iterated

hashing constructions. Our main contribution is to introduce a new technique for hashing arbitrary

length strings. Combined with DAG based hashing, this technique gives a new hashing algorithm. The

amount of padding and the number of invocations of the compression function required by the new

algorithm is smaller than the general Merkle-Damg˚ ard algorithm. Lastly, we describe the design of a

new parallel hash algorithm.

Keywords : hash function, compression function, composition principle, collision resistance, directed

acyclic graph.

1 Introduction

Hash functions are a basic cryptographic primitive and are used extensively in digital signature protocols.

For such applications, a hash function must satisfy certain necessary properties including collision resistance

and pre-image resistance. Collision resistance implies that it should be computationally intractable to find

two elements in the domain which are mapped to the same element in the range. On the other hand,

pre-image resistance means that given an element of the range, it should be computationally intractable

to find its pre-image.

Construction of collision resistant and pre-image resistant hash functions are of both practical and

theoretical interest. Most practical hash functions are designed from scratch. The advantage of designing

a hash function from scratch is that one can use simple logical/arithmetic operations to design the algorithm

and hence achieve very high speeds. The disadvantage is that we obtain no proof of collision resistance.

Hence a user has to assume that the function is collision resistant. A well accepted intuition in this area is

that it is more plausible to assume a function to be collision resistant when the domain is fixed (and small)

rather than when it is infinite (or very large). A fixed domain function which is assumed to be collision

resistant is often called a compression function.

For practical use, it is required to hash messages of arbitrary lengths. Hence one must look for methods

which extend the domain of a compression function in a “secure” manner, i.e., the extended domain hash

function is collision resistant provided the compression function is collision resistant. Any method which

achieves this is often called a composition principle.

1

Page 2

Composition principles based on iterated applications of the compression function are known and these

are called variants of the Merkle-Damg˚ ard algorithm [2, 4]. The most general of these algorithms can

hash arbitrarily long messages and assumes the compression function to be only collision resistant. Other

variants can hash messages of a maximum possible length or assumes the compression function to be

both collision resistant and one-way. See Section 3 for a detailed discussion of several variants of the

Merkle-Damg˚ ard algorithm.

Our Contributions: In this paper, we are concerned with the problem of constructing a hash function

which can hash arbitrarily long messages and which can be proved to be collision resistant under the

assumption that the compression function is collision resistant. To justify the non-triviality of the problem

we describe a construction which can be proved to be secure if the compression function is both collision

resistant and one-way while it is insecure if the compression function is only collision resistant.

The first step in our construction is to consider a very general class of domain extending algorithms.

The structure of any algorithm in the class that we consider can be described using a directed acyclic graph

(DAG). In Section 5, we provide a construction of a secure domain extending algorithm using an arbitrary

DAG. The Merkle-Damg˚ ard algorithm uses a dipath and is a special case of DAG based algorithms.

Our main contribution (in Section 6) is to provide a solution to the problem of hashing arbitrary length

strings for DAG based algorithms. Our algorithm improves upon the (general) Merkle-Damg˚ ard algorithm

both in terms of padding length and number of invocations. Our construction can be proved to be collision

resistant under the assumption that the compression function is only collision resistant.

In Section 8, we provide some concrete examples of hashing structures and show that these can be

combined nicely to design a parallel hash function. We note, however, that we do not provide a detailed

specification of an actual hash function. Such a specification will necessarily involve many practical and

implementation issues which are not really within the scope of the current work.

A theoretical justification of our work is provided by the fact that our results improve upon a fifteen

year old classical work. Since our work improves upon the Merkle-Damg˚ ard algorithm, a natural question

is whether further improvements are possible. This naturally leads to the problem of obtaining non-trivial

lower bounds (and optimal algorithms) on padding lengths and number of invocations. These problems

can provide motivation for future research.

2 Preliminaries

We write |x| for the length of a string and x1||x2 for the concatenation of two strings x1 and x2. The

reverse of the string x will be denoted by xr. By an (n,m) function we will mean a function which maps

{0,1}nto {0,1}m. All logarithms in the paper are in base two.

For n > m, let h be an (n,m) function. Two n-bit strings x and x?in X are said to collide for h, if

x ?= x?but h(x) = h(x?). A hash function h : X → Y is said to be collision resistant if it is computationally

intractable to find collisions for h. A formal definition of this concept requires the consideration of a family

of functions (see [2, 5]).

In this paper, we are interested in “securely” extending the domain of a hash function. More precisely,

given an (n,m) function h : {0,1}n→ {0,1}m, with n > m+1, we construct a function h∞: ∪i≥1{0,1}i→

{0,1}m, such that one can prove the following: Given any collision for h∞, it is possible to obtain a

collision for h. The last statement is formalized in terms of a Turing reduction between two suitably

defined problems (see below). The advantage of this method is that we only prove a reduction and at no

point are we required to use a formal definition of collision resistance. This approach has been previously

used in the study of hash functions [6].

2

Page 3

We now turn to the task of defining our approach to reducibilities between different problems related

to the property of collision resistance. Consider the following problem as defined in [6].

Problem

Instance

Find

:

:

:

Collision Col(n,m)

An (n,m) hash function h.

x,x?∈ {0,1}nsuch that x ?= x?and h(x) = h(x?).

By an (?,q) (probabilistic) algorithm for Collision we mean an algorithm which invokes the hash function

h at most q times and solves Col(n,m) with probability of success at least ?.

The domain of h is the set of all n-bit strings. We would like to extend the domain to the set of all

nonempty binary strings, i.e., to construct a function h∞: ∪i≥1{0,1}i→ {0,1}m. We would like to relate

the difficulty of finding collisions for h∞to that of finding collisions for h. Thus, we consider the following

problem.

Problem

Instance

Find

:

:

:

Arbitrary length collision ALC(n,m,L)

An (n,m) hash function h and an integer L ≥ 1.

x,x?∈ ∪L

i=1{0,1}isuch that x ?= x?and h∞(x) = h∞(x?).

By an (?,q,L) (probabilistic) algorithm A for Arbitrary length collision we will mean an algorithm that

makes at most q invocations of the function h and solves ALC(n,m,L) with probability of success at least

?.

Later we show Turing reductions from Collision to Arbitrary Length Collision. Informally, this means

that given oracle access to an algorithm for solving ALC(n,m,L) for h∞it is possible to construct an

algorithm to solve Col(n,m) for h. These will show that our constructions preserve the intractibility of

finding collisions.

Pre-image resistance:

means that given y ∈ {0,1}m, it is computationally infeasible to find an x, such that f(x) = y. Pre-image

resistance (or one-wayness) is a crucially important property on its own. On the other hand, this property

is sometimes used to prove security of domain extending techniques for collision resistant hash functions.

Suppose the domain of an (n,m) hash function h is extended to obtain the hash function H(). For certain

constructions [2], one can show that h∞is collision resistant if h is both collision resistant and one-way.

We would like to emphasize that this is not the approach we will take in this paper. In our constructions,

we will assume h to be only collision resistant.

This is an important property for cryptographic hash functions. Informally, this

3Iterated Hashing

In this section, we briefly review iterative techniques for extending the domain of a collision resistant

compression function. These techniques are attributed to [4, 2] and are commonly called the Merkle-

Damg˚ ard constructions.

Let h be an (n,m) compression function and IV be an m-bit string. Each of the domain extending

methods described below use IV and h to construct a new function which can hash “long” strings to obtain

m-bit digest. The IV can be chosen randomly, but once chosen it cannot be changed and becomes part of

the specification for the extended domain hash function.

3

Page 4

3.1 Construction I: Basic Iteration

We define a hash function H(I)whose domain consists of all binary strings whose length is a multiple of

(n − m). Let x be a message whose length is i(n − m) for some i ≥ 1. We write x = x1||···||xi, where

each xj is a string of length (n − m). Define z1= h(IV||x1) and for j > 1, define zj= h(zj−1||xj). The

digest of x under H(I)is defined to be zi, i.e., H(I)= zi.

The function H(I)can be proved to be collision resistant. Briefly, the argument proceeds as follows.

Suppose x and x?are two strings such that x ?= x?and H(I)(x) = H(I)(x?). If we have |x| = |x?|, then an

easy backward induction shows that there must be a collision for the function h. On the other hand, if

|x| ?= |x?|, then it can be argued that the collision for H(I)either leads to a collision for h or a pre-image

of IV under h. Thus, if we assume that h is both collision resistant and pre-image resistant, then H(I)is

collision resistant.

3.2Construction II: General Construction

Our description of the general version (which appears in [2]) is from [7] for the case n − m > 1. (The case

n − m = 1 is a little more complicated. We do not mention it here since we will not consider such values

of n and m for our constructions.)

Let H(II)be the extended domain hash function which is to be defined. Let x be a message to be

hashed and we have to define the digest H(II)(x). Write x = x1||x2||...||xk, where |x1| = |x2| = ··· =

|xk−1| = n − m − 1 and |xk| = n − m − 1 − d with 0 ≤ d ≤ n − m − 2. For 1 ≤ i ≤ k − 1, let yi= xi;

yk= xk||0dand yk+1is the (n − m − 1)-bit binary representation of d. Define z1= h(IV||0||y1) and for

1 ≤ i ≤ k, define zi+1= h(zi||1||yi+1). The digest of x under H(II)is zk+1, i.e., H(II)(x) = zk+1.

Note that the domain consists of all possible binary strings, i.e., there is no length restriction on the

input message x. It can be shown that H(II)is collision resistant assuming h to be only collision resistant.

(See [7] for a proof.)

3.3Construction III: SHA Family Construction

The specification of the SHA family of constructions uses a variant of the iterative hashing technique. We

denote this variant by H(III).

Let x be the message to be hashed. First we form the string: pad(x) = x||1||0k||binc(|x|), where c is a

constant such that c < n−m, binc(|x|) is the c-bit binary representation of x and k is the least non-negative

integer such that |x| + 1 + k ≡ (n − m − c) mod (n − m), or equivalently x + c + 1 + k ≡ 0 mod (n − m).

The length of pad(x) is equal to l(n − m) for some l ≥ 1. (For SHA-256, n = 768, m = 256 and c = 64.)

The message digest is defined to be H(III)(x) = H(I)(pad(x)).

This construction can only handle messages of lengths less than 2c. Putting c = 64 (as in SHA-256) is

usually sufficient for all practical purposes. The maximum amount of padding is n−m which is a constant,

i.e., independent of the message length.

3.4Construction IV: Another Length Bounded Construction

We define a function H(IV)which like H(III)can also hash all binary strings of a maximum possible length.

Let the message be x. Append the minimum number of zeros to x so as to make the length a multiple

of (n−m). Now divide x into l blocks x0,...,xl−1of lengths (n−m) bits each. Define y0= h(IV||x0) and

for 1 ≤ i ≤ l − 1, define yi= h(yi−1||xi). Finally define z = h(yl−1||w), where w is the (n − m)-bit binary

4

Page 5

Table 1: Comparison of features of different constructions for a message x.

Cons.

I

domain sz.

infinite

length res.

|x| = i(n − m),

i ≥ 1

none

padding

none

# invoc.

|x|

n−m

assumption on h()

c.r. and

one-way

c.r. IIinfinite2n − m − 2

+

n−m−1

m

?

|x|

?

1 +

?

|x|

n−m−1

?

a ∈ {0,1}

1 +

?

, III2c,

|x| < 2c,

c < n − m

|x| < 2n−m

a +

|x|

n−m

?

?

?

c.r.

c < n − m

< 2n−m

IV2n − m − 1

|x|

n−m

c.r.

representation of |x|, i.e. w = binn−m(|x|). The digest of x is z. Clearly, this algorithm can be applied only

when the length of x is less than 2n−m. Again, this construction can be proved to be collision resistant

assuming h to be only collision resistant.

3.5 Role of IV

Each of the constructions described above use an m-bit string as an IV. The IV is essential in Construction I,

since in this construction we require h to be such that it is infeasible to find a pre-image of IV under h. On

the other hand, for Constructions II to IV, we can replace IV by the initial m bits of the message without

affecting the collision resistance of the extended domain hash function. If we do this, then in certain cases,

we can hash an extra m bits without increasing the number of invocations of h. In general, this is not

a significant gain, though it may become significant if we repeatedly hash short messages such as digital

certificates.

3.6Discussion

In Table 1, we compare the properties of the different constructions. For each construction, we provide

the size of the extended domain; the restriction on the lengths of messages to be hashed; the maximum

amount of padding; the maximum number of invocations of h() that are made while extending the domain;

and the security assumption made on h(). (In our count of the number of padded bits, we also include the

IV.) The first construction is proved to be collision resistant under the assumption that h() is both collision

resistant and one-way, while the other three constructions can be proved to be collision resistant under the

assumption that h() is only collision resistant. Construction II can handle arbitrary length strings, while

the Constructions III and IV can handle bounded length strings. On the other hand, Constructions III

and IV are more efficient than Construction II.

Question:

can handle arbitrary length strings, whose collision resistance is based only on the collision resistance of h

and which is more efficient than Construction II?

The theoretical question that now arises is whether it is possible to obtain a construction which

5