Conference Paper

Constructing a Set of Weak Values for Full-round MD4 Hash Function

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Later it was done faster in [7]- [9]. In 2020, an MD4-based function was constructed and the full (48-step) version of this function was inverted [10]. In 2022, 40-, 41-, 42-, and 43-step versions MD4 compression function were inverted [1]. ...
Article
MD4 and MD5 are fundamental cryptographic hash functions proposed in the early 1990s. MD4 consists of 48 steps and produces a 128-bit hash given a message of arbitrary finite size. MD5 is a more secure 64-step extension of MD4. Both MD4 and MD5 are vulnerable to practical collision attacks, yet it is still not realistic to invert them, i.e., to find a message given a hash. In 2007, the 39-step version of MD4 was inverted by reducing to SAT and applying a CDCL solver along with the so-called Dobbertin’s constraints. As for MD5, in 2012 its 28-step version was inverted via a CDCL solver for one specified hash without adding any extra constraints. In this study, Cube-and-Conquer (a combination of CDCL and lookahead) is applied to invert step-reduced versions of MD4 and MD5. For this purpose, two algorithms are proposed. The first one generates inverse problems for MD4 by gradually modifying the Dobbertin’s constraints. The second algorithm tries the cubing phase of Cube-and-Conquer with different cutoff thresholds to find the one with the minimum runtime estimate of the conquer phase. This algorithm operates in two modes: (i) estimating the hardness of a given propositional Boolean formula; (ii) incomplete SAT solving of a given satisfiable propositional Boolean formula. While the first algorithm is focused on inverting step-reduced MD4, the second one is not area-specific and is therefore applicable to a variety of classes of hard SAT instances. In this study, 40-, 41-, 42-, and 43-step MD4 are inverted for the first time via the first algorithm and the estimating mode of the second algorithm. Also, 28-step MD5 is inverted for four hashes via the incomplete SAT solving mode of the second algorithm. For three hashes out of them, it is done for the first time.
Article
Full-text available
In this paper we describe a class of cryptographic guess-and-determine attacks which is based on the notion of a linearizing set. A linearizing set-based attack is applied to a system of Multivariate Quadratic equations (MQ) over GF (2) field, which encodes how a considered cryptographic function works. By substituting into such MQ system a random (in some strict sense) assignment of variables from a linearizing set we aim to transform the system into a linear one. We introduce a probability of such an event and call it a probability of linearization. Then we describe a guess-and-determine attack, the hardness of which can be expressed via a probability of linearization. To estimate the latter it is possible to use a simple Monte Carlo algorithm. Also we describe a technique that allows to augment a considered MQ system by new linear equations and to construct a new MQ system, for which the probability of linearization is usually larger than that for an original one. For this purpose we apply a SAT oracle to a Boolean formula that is naturally associated with a considered MQ system. Finally, we reduce the problem of searching for a linearizing set that yields the best effectiveness of a constructed guess-and-determine attack to a pseudo-Boolean optimization problem, which can be solved using metaheuristic optimization algorithms. The important consequence of this is that this way we can construct guess-and-determine attacks automatically by solving the corresponding optimization problem. In the computational experiments we used the proposed methodology to construct attacks on several well-known stream ciphers. The runtime estimations of some of the attacks make it possible to implement them in reasonable time.
Preprint
Full-text available
MD4 and MD5 are seminal cryptographic hash functions proposed in early 1990s. MD4 consists of 48 steps and produces a 128-bit hash given a message of arbitrary finite size. MD5 is a more secure 64-step extension of MD4. Both MD4 and MD5 are vulnerable to practical collision attacks, yet it is still not realistic to invert them, i.e. to find a message given a hash. In 2007, the 39-step version of MD4 was inverted via reducing to SAT and applying a CDCL solver along with the so-called Dobbertin's constraints. As for MD5, in 2012 its 28-step version was inverted via a CDCL solver for one specified hash without adding any additional constraints. In this study, Cube-and-Conquer (a combination of CDCL and lookahead) is applied to invert step-reduced versions of MD4 and MD5. For this purpose, two algorithms are proposed. The first one generates inversion problems for MD4 by gradually modifying the Dobbertin's constraints. The second algorithm tries the cubing phase of Cube-and-Conquer with different cutoff thresholds to find the one with minimal runtime estimation of the conquer phase. This algorithm operates in two modes: (i) estimating the hardness of an arbitrary given formula; (ii) incomplete SAT-solving of a given satisfiable formula. While the first algorithm is focused on inverting step-reduced MD4, the second one is not area-specific and so is applicable to a variety of classes of hard SAT instances. In this study, for the first time in history, 40-, 41-, 42-, and 43-step MD4 are inverted via the first algorithm and the estimating mode of the second algorithm. 28-step MD5 is inverted for four hashes via the incomplete SAT-solving mode of the second algorithm. For three hashes out of them this is done for the first time.
Chapter
This paper studies the problem of algebraic cryptanalysis where state-of-the-art SAT solvers are used to invert some cryptographic function. We define a new metric of the hardness of CNF formulas that encode the corresponding cryptanalysis problems. The introduced metric is similar to the well-known tree-like metrics used in the theory of propositional proofs. However, unlike the latter, the new metric can be effectively estimated in application to specific cryptographic functions. The corresponding approach combines the Monte Carlo method and metaheuristic black-box optimization algorithms. The proposed algorithms require a large amount of computational resources, and for their experimental evaluation we used a supercomputer. In the experiments, we applied the proposed metrics to construct estimations of guess-and-determine attacks on the compression function of the well-known MD4 cryptographic hash algorithm.KeywordsAlgebraic cryptanalysisBoolean Satisfiability Problem (SAT)SAT solversGuess-and-determine attacksInverse Backdoor Set (IBS)
Chapter
In this paper we describe a new evolutionary strategy. It is based on the common (1+1) random mutation scheme which was augmented with metaheuristic technique named merging variables principle, that was proposed by us in previous works. We show that the new variant of (1+1)-EA has asymptotically lower worst case estimation than the original (1+1)-EA. In the experimental part we conduct comparison of the proposed strategy with several known variants of (1+1)-EA and demonstrate its practical applicability for a number of hard instances of MaxSAT problem.
Article
Full-text available
In the present paper, we propose a technology for translating algorithmic descriptions of discrete functions to SAT. The proposed technology is aimed at applications in algebraic cryptanalysis. We describe how cryptanalysis problems are reduced to SAT in such a way that it should be perceived as natural by the cryptographic community. In~the theoretical part of the paper we justify the main principles of general reduction to SAT for discrete functions from a class containing the majority of functions employed in cryptography. Then, we describe the Transalg software tool developed based on these principles with SAT-based cryptanalysis specifics in mind. We demonstrate the results of applications of Transalg to construction of a number of attacks on various cryptographic functions. Some of the corresponding attacks are state of the art. We compare the functional capabilities of the proposed tool with that of other domain-specific software tools which can be used to reduce cryptanalysis problems to SAT, and also with the CBMC system widely employed in symbolic verification. The paper also presents vast experimental data, obtained using the SAT solvers that took first places at the SAT competitions in the recent several years.
Article
Full-text available
Propositional satisfiability (SAT) is at the nucleus of state-of-the-art approaches to a variety of computationally hard problems, one of which is cryptanalysis. Moreover, a number of practical applications of SAT can only be tackled efficiently by identifying and exploiting a subset of formula's variables called backdoor set (or simply backdoors). This paper proposes a new class of backdoor sets for SAT used in the context of cryptographic attacks, namely guess-and-determine attacks. The idea is to identify the best set of backdoor variables subject to a statistically estimated hardness of the guess-and-determine attack using a SAT solver. Experimental results on weakened variants of the renowned encryption algorithms exhibit advantage of the proposed approach compared to the state of the art in terms of the estimated hardness of the resulting guess-and-determine attacks.
Article
Full-text available
In this paper we propose the technology for constructing propositional encodings of discrete functions. It is aimed at solving inversion problems of considered functions using state-of-the-art SAT solvers. We implemented this technology in the form of the software system called Transalg, and used it to construct SAT encodings for a number of cryptanalysis problems. By applying SAT solvers to these encodings we managed to invert several cryptographic functions. In particular, we used the SAT encodings produced by Transalg to construct the family of two-block MD5 collisions in which the first 10 bytes are zeros. Also we used Transalg encoding for the widely known A5/1 keystream generator to solve several dozen of its cryptanalysis instances in a distributed computing environment. In the paper we compare in detail the functionality of Transalg with that of similar software systems.
Conference Paper
Full-text available
A practical digital signature system based on a conventional encryption function which is as secure as the conventional encryption function is described. Since certified conventional systems are available it can be implemented quickly, without the several years delay required for certification of an untested system.
Conference Paper
Full-text available
MD4 is a hash function developed by Rivest in 1990. It serves as the basis for most of the dedicated hash functions such as MD5, SHAx, RIPEMD, and HAVAL. In 1996, Dobbertin showed how to find collisions of MD4 with complexity equivalent to 220 MD4 hash computations. In this paper, we present a new attack on MD4 which can find a collision with probability 2− 2 to 2− 6, and the complexity of finding a collision doesn’t exceed 28 MD4 hash operations. Built upon the collision search attack, we present a chosen-message pre-image attack on MD4 with complexity below 28. Furthermore, we show that for a weak message, we can find another message that produces the same hash value. The complexity is only a single MD4 computation, and a random message is a weak message with probability 2− 122. The attack on MD4 can be directly applied to RIPEMD which has two parallel copies of MD4, and the complexity of finding a collision is about 218 RIPEMD hash operations.
Conference Paper
Full-text available
This paper proposes several approaches to improve the col- lision attack on MD4 proposed by Wang et al. First, we propose a new local collision that is the best for the MD4 collision attack. Selection of a good message difierence is the most important step in achieving efiective collision attacks. This is the flrst paper to introduce an improvement to the message difierence approach of Wang et al., where we propose a new local collision. Second, we propose a new algorithm for constructing difierential paths. While similar algorithms have been proposed, they do not support the new local collision technique. Finally, we complete a col- lision attack, and show that the complexity is smaller than the previous best work.
Article
Full-text available
We show that tools from circuit complexity can be used to study decompositions of global constraints. In particular, we study decompositions of global constraints into conjunctive normal form with the property that unit propagation on the decomposition enforces the same level of consistency as a specialized propagation algorithm. We prove that a constraint propagator has a a polynomial size decomposition if and only if it can be computed by a polynomial size monotone Boolean circuit. Lower bounds on the size of monotone Boolean circuits thus translate to lower bounds on the size of decompositions of global constraints. For instance, we prove that there is no polynomial sized decomposition of the domain consistency propagator for the ALLDIFFERENT constraint. Comment: Proceedings of the Twenty-first International Joint Conference on Artificial Intelligence (IJCAI-09). Old file included deleted
Article
One of the most important paradigm shifts in the use of SAT solvers for solving industrial problems has been the introduction of clause learning. Clause learning entails adding a new clause for each conflict during backtrack search. This new clause prevents the same conflict from occurring again during the search process. Moreover, sophisticated techniques such as the identification of unique implication points in a graph of implications, allow creating clauses that more precisely identify the assignments responsible for conflicts. Learned clauses often have a large number of literals. As a result, another paradigm shift has been the development of new data structures, namely lazy data structures, which are particularly effective at handling large clauses. These data structures are called lazy due to being in general unable to provide the actual status of a clause. Efficiency concerns and the use of lazy data structures motivated the introduction of dynamic heuristics that do not require knowing the precise status of clauses. This chapter describes the ingredients of conflict-driven clause learning SAT solvers, namely conflict analysis, lazy data structures, search restarts, conflict-driven heuristics and clause deletion strategies.
Article
Problems of inversion of discrete functions that are deterministically computable for polynomial time is considered. The propositional approach, which is based on the technique of representation of algorithms as systems of logical equations, is applied.
Conference Paper
Inverting a function f at a given point y in its range involves finding any x in the domain such that f(x) = y. This is a general problem. We wish to find a heuristic for inverting those functions which satisfy certain statistical properties similar to those of random functions. As an example, we choose popular secure hash functions which are expected to be hard to invert and any successful strategy to do so will be quite useful. This provides an excellent challenge for sat solvers. We first find the limits of inverting via direct encoding of these functions as SAT: for md4 this is one round and twelve steps and for md5 it is one round and ten steps. Then, we show that by adding customized constraints obtained by modifying an earlier attack by Dobbertin, we can invert md4 up to 2 rounds and 7 steps in
Conference Paper
MD4 is a hash function introduced by Rivest in 1990. It is still used in some contexts, and the most commonly used hash functions (MD5, Sha1, Sha2) are based on the design principles of MD4. MD4 has been extensively studied and very efficient collision attacks are known, but it is still believed to be a one-way function. In this paper we show a partial pseudo-preimage attack on the compression function of MD4, using some ideas from previous cryptanalysis of MD4. We can choose 64 bits of the output for the cost of 232 compression function computations (the remaining bits are randomly chosen by the preimage algorithm). This gives a preimage attack on the compression function of MD4 with complexity 296, and we extend it to an attack on the full MD4 with complexity 2102. As far as we know this is the first preimage attack on a member of the MD4 family.
Article
We argue that the random oracle model ---where all parties have access to a public random oracle--- provides a bridge between cryptographic theory and cryptographic practice. In the paradigm we suggest, a practical protocol P is produced by first devising and proving correct a protocol P R for the random oracle model, and then replacing oracle accesses by the computation of an "appropriately chosen" function h. This paradigm yields protocols much more efficient than standard ones while retaining many of the advantages of provable security. We illustrate these gains for problems including encryption, signatures, and zero-knowledge proofs. Department of Computer Science & Engineering, Mail Code 0114, University of California at San Diego, 9500 Gilman Drive, La Jolla, CA 92093. E-mail: mihir@cs.ucsd.edu y Department of Computer Science, University of California at Davis, Davis, CA 95616, USA. E-mail: rogaway@cs.davis.edu 1 1 Introduction Cryptographic theory has provided a p...
Conference Paper
In [1] it was shown that there are very effective attacks leading to collisions for the hash function MD4 designed by R. Rivest [3]. A summary of the status of hash functions of the MD4-family with respect to collision-resistence can be found in [2] and [4]. However, attacking the one-wayness of a hash function is a much more demanding challenge, and in case of success it has much more devastating consequences. No result along this line is known for MD4 and its successors. Therefore it is worth to explore how the recently developed new analytic methods for finding collisions can be applied to construct preimages or second preimages. As a first step, we state here the following partial result.
The MD4 message digest algorithm
  • R L Rivest
A design principle for hash functions
  • I Damgåard