Conference PaperPDF Available

Cuteforce analyzer: A distributed bruteforce attack on PDF encryption with GPUs and FPGAs

Authors:

Abstract and Figures

Working on cryptanalytic tasks using a hetero-geneous cluster with different types of processors (CPU, GPU, FPGA) can be an advantage over classical homogeneous clusters. In this paper we demonstrate that distributing cryptanalytics tasks to different types of processors can lead to better performance than can be achieved using a single type of processor. To this end we have built a framework for the management of a heterogeneous cluster and implented a password bruteforcer for password protected PDF doc-uments. Our results show that such a framework can be implemented with little overhead in terms of performance.
Content may be subject to copyright.
A preview of the PDF is not available
... As a workaround to counter direct exfiltration attacks, PDF viewers might consider dropping support for partially encrypted files based on crypt filters, as specified in PDF ≥ 1.5, and based on additional features as documented in Appendix A. While this would make standard-conforming documents unreadable (e.g., PDF documents where only the attachment is encrypted), we presume the number of affected documents is limited in practice. 9 Another short-term mitigation would be enforcing a policy were unencrypted objects are not allowed to access encrypted content anymore -similar to "mixed content" warnings in the web, which are thrown by modern web browsers, for example, when JavaScript code from an insecure resource is to be executed on a secure website (see [7]). In the long term, the PDF 2.x specification should drop support for mixed content altogether 10 -the authors consider it to be a security nightmare. ...
... We separated existing research into three categories: PDF security, PDF encryption, and attacks on the encryption of different data formats. We firstly introduce related work covering different aspects regarding PDF security such as PDF malware, PDF insecure features, 9 We analyzed a dataset of 8,840 encrypted PDF documents obtained from crawling the Alexa top 1 million websites and found only 353 to contain "partial encryption", all of them due to unencrypted metadata streams. 10 Note that there seems to be a trend towards the opposite direction and newer PDF specifications often added flexibility (e.g., "Unencrypted Wrappers" in PDF 2.0). ...
... As a consequence, Adobe updated the key derivation function in the PDF 1.7 specification [37]. In 2013, Danczul et al. introduced a new technique to efficiently brute-force PDF passwords by distributing crypt analysis tasks to different types of processors [9]. The authors concentrated on older PDF versions (PDF 1.1 to 1.5) using the RC4 algorithm for encryption. ...
Conference Paper
The Portable Document Format, better known as PDF, is one of the most widely used document formats worldwide, and in order to ensure information confidentiality, this file format supports document encryption. In this paper, we analyze PDF encryption and show two novel techniques for breaking the confidentiality of encrypted documents. First, we abuse the PDF feature of partially encrypted documents to wrap the encrypted part of the document within attacker-controlled content and therefore, exfiltrate the plaintext once the document is opened by a legitimate user. Second, we abuse a flaw in the PDF encryption specification to arbitrarily manipulate encrypted content. The only requirement is that a single block of known plaintext is needed, and we show that this is fulfilled by design. Our attacks allow the recovery of the entire plaintext of encrypted documents by using exfiltration channels which are based on standard compliant PDF properties. We evaluated our attacks on 27 widely used PDF viewers and found all of them to be vulnerable. We responsibly disclosed the vulnerabilities and supported the vendors in fixing the issues.
... They partition the application into parts where each part is executed on the device, which is reportedly ideally matched for executing the part. Danczul et al. [28] relate their experience building a heterogeneous cluster of 16 nodes on which they implement a password bruteforcer for password-protected documents. Each node is equipped with an Nvidia GeForce GTX 680 device and a Xilinx ML605 FPGA development board. ...
Article
Full-text available
Modern HPC platforms are highly heterogeneous with tight integration of multicore CPUs and accelerators (such as Graphics Processing Units, Intel Xeon Phis, or Field-Programmable Gate Arrays) empowering them to address the twin critical concerns of performance and energy efficiency. Due to this inherent characteristic, processing elements contend for shared on-chip resources such as Last Level Cache (LLC), interconnect, etc. and shared nodal resources such as DRAM, PCI-E links, etc., resulting in complexities such as resource contention, non-uniform memory access (NUMA), and accelerator-specific limitations such as limited main memory thereby necessitating support for efficient out-of-card execution. Due to these complexities, the performance profiles of data-parallel applications executing on these platforms are not smooth and deviate significantly from the shapes that allowed state-of-the-art load-balancing algorithms to find optimal solutions. In this paper, we propose a hierarchical two-level data partitioning algorithm minimizing the parallel execution time of data-parallel applications on clusters of h identical nodes where each node has c heterogeneous processors. This algorithm takes as input c discrete speed functions of cardinality m corresponding to the c heterogeneous processors. It does not make any assumptions about the shapes of these functions. Unlike load balancing algorithms, optimal solutions found by the algorithm may not load-balance an application in terms of execution time. The proposed algorithm has low time complexity of O(m2×h+m3×c3) unlike the state-of-the-art algorithm solving the same problem with the complexity of O(m3×c3×h3). We also propose an extension of the algorithm for clusters of h non-identical nodes where each node has c heterogeneous processors. We experimentally demonstrate the optimality of our algorithm using two well-known and highly optimized multi-threaded data-parallel applications, matrix-matrix multiplication and 2D fast Fourier transform, on a heterogeneous multi-accelerator NUMA node containing an Intel multicore Haswell CPU, an Nvidia K40c GPU, and an Intel Xeon Phi co-processor and a simulated homogeneous cluster of such nodes.
Article
Full-text available
In recent years, the most powerful supercomputers have already reached megawatt power consumption levels, an important issue that challenges sustainability and shows the impossibility of maintaining this trend. To this date, the prevalent approach to super-computing is dominated by CPUs and GPUs. Given their fixed architectures with generic instruction sets, they have been favored with lots of tools and mature workflows which led to mass adoption and further growth. However, reconfigurable hardware such as FPGAs has repeatedly proven that it offers substantial advantages over this super-computing approach concerning performance and power consumption. In this survey, we review the most relevant works that advanced the field of heterogeneous super-computing using FPGAs focusing on their architectural characteristics. Each work was divided into three main parts: network, hardware, and software tools. All implementations face challenges that involve all three parts. These dependencies result in compromises that designers must take into account. The advantages and limitations of each approach are discussed and compared in detail. The classification and study of the architectures illustrate the trade-offs of the solutions and help identify open problems and research lines.
Article
The recovery of encrypted information systems based on password verification mechanisms is an important task in electronic forensics, data restoration, illegal information filtering, and network security maintenance. The traditional password recovery system is mainly based on CPUs and GPUs, which have low cracking efficiency and cannot meet the computing needs of users. Therefore, this paper proposes a cognitively reconfigurable mimic-based heterogeneous password recovery system. Applying the concept of mimic computing, a multidimensional and reconfigurable hybrid heterogeneous system is established through the use of CPUs, GPUs, and FPGAs, which works by coordinating software and hardware to improve the system’s computing ability and range of solution types to meet diverse cracking needs. Second, high-performance password recovery and password generation algorithms that run on FPGAs are designed to expedite the calculation and verification of passwords. Third, a hierarchical password database is established to realize dynamic feedback and updating of passwords, aiming for various application scenarios and improving password recovery efficiency. Finally, through task perception and decision-making reasoning and with dynamic structural transformation and load balancing, the computing potential of the system is fully mobilized so that the entire system can efficiently complete the encryption recovery task. Experimental analysis and results show that compared with traditional CPU and GPU systems, our system exhibits significantly improved recovery efficiency and can better support heterogeneous systems, with high scalability and energy efficiency.
Article
The pairing of traditional multicore processors with accelerators of various forms (e.g., graphics engines, reconfigurable logic) can be referred to generally as architecturally diverse systems. Our interest in this work is truly diverse systems, in which more than one accelerator is used in the execution of an application. These systems have the potential for substantial performance gains relative to multicores alone; however, they pose significant difficulties when it comes to application development. In spite of these difficulties, the use of accelerators in high performance computation, generally, has grown substantially over the past decade. This is primarily due to a pair of forces. First, with the demise of Dennard scaling, power has become a substantial limiting factor in systems development, pushing computations to be more power efficient (a strength of many accelerators). Second, the application development environments for accelerators have improved substantially in recent years. We review the use of multiple, distinct accelerators deployed in a individual system or, more to the point, used concurrently within an individual application. We give a history of architecturally diverse systems that use multiple accelerators, discuss the motivations for diversity in accelerators, and describe the approaches that both system designers and application developers have used to put accelerators to beneficial use.
Article
The recovery of encrypted information based on password authentication is an important mechanism to maintain network security. As a result, many password recovery systems have been developed. However, those systems are inefficient and energy intensive because they are primarily optimized for CPUs and GPUs. Inspired by a new computing model, namely, mimic computing – a hardware/software co-designed computing model that can dynamically reconfigure appropriate system structures based on application features – we propose a novel password recovery system. The design of such a system is non-trivial and includes several challenges: (1) how to build high-performance password recovery reconfigurable algorithms; (2) how to partition the hardware and software for password recovery; (3) how to optimize resource utilization and power consumption; and (4) how to improve the scalability. We present our insights, design decisions, and implementation details to address these challenges. Our extensive experiments show that the newly designed password recovery system significantly outperforms traditional CPU-based and GPU-based systems in terms of both efficiency and energy consumption. In particular, our system is 27.81 and 4.23 times faster than CPU-based and GPU-based systems in terms of password cracking, and our system consumes 14.97 and 5.97 times less energy than CPU-based and GPU-based systems.
Conference Paper
Full-text available
The complexity estimate of a hash collision algorithm is given by the unit hash compressions. This paper shows that this figure can lead to false runtime estimates when accelerating the algorithm by the use of graphics processing units (GPU) and field-programmable gate arrays (FPGA). For demonstration, parts of the CPU reference implementation of Marc Stevens' SHA-1 Near-Collision Attack are implemented on these two accelerators by taking advantage of their specific architectures. The implementation, runtime behavior and performance of these ported algorithms are discussed, and in conclusion, it is shown that the acceleration results in different complexity estimates for each type of coprocessor.
Conference Paper
MD5 Crypt is a cryptographic algorithm used commonly in UNIX system for authentication. By using the additional randomization of the salt and complexity of the scheme, it makes the traditional password cracking techniques invalid on common computing systems and the security of the system is guaranteed. Benefited from the thriving of petaflops heterogeneous supercomputer system recently, such as Tianhe-1A, the security of MD5 Crypt is facing a threat of Brute Force Attack again. Many works have been done on the GPU-accelerated platform to improve the performance of MD5 Crypt. However, little increase has been achieved by using the constant memory of CUDA architecture. This paper explores this problem and archived 44.6% improvement by allocating constant memory to the padding array. And this paper presents a high scalable implementation of Brute Force Attack Algorithm of MD5 Crypt on Tianhe-1A, which is the fastest heterogeneous supercomputer of the world. The experimental results have shown that 326 thousands MD5 hashes could be checked per second on one single computing node and outperform 5.7X than the CPU version. On multi-nodes, the implementation also shows a great scalability. Consequently, it issued a new challenge to the security of MD5 crypt for authentication.
Article
Thanks to the development of hardware technology, the Graphics Processing Unit (GPU), as a highly parallel programmable processor, has been applied to more and more advanced mainstream computing systems. In this paper, some optimizations for Message-Digest algorithm 5(MD5) hash reverse were presented, which have been implemented on a GPU parallel architecture called CUDA. The performance of our solution is compared with the implementation running on an AMD II X4 945 four core CPU running at 3.0GHz, and the result shows that our GPU-based MD5 hash reverse implementation is more than ten times faster than an optimized CPU implementation.
Conference Paper
Benefit from the novel compute unified device architecture (CUDA) introduced by NVIDIA, graphics processing unit (GPU) turns out to be a promising solution for cryptography applications. In this paper we present an efficient implementation for MD5-RC4 encryption using NVIDIA GPU with novel CUDA programming framework. The MD5-RC4 encryption algorithm was implemented on NVIDIA GeForce 9800GTX GPU. The performance of our solution is compared with the implementation running on an AMD Sempron Processor LE-1200 CPU. The results show that our GPU-based implementation exhibits a performance gain of about 3-5 times speedup for the MD5-RC4 encryption algorithm.
Conference Paper
Graphics processing unit (GPU) has evolved into a highly parallel, multithreaded, many-core processor with tremendous computational capability. The introduction of compute unified device architecture (CUDA) simplifies the software development on GPU and allows direct access to GPU resources. It's an effective way to improve the hashing performance in high-speed network and storage systems by using GPU as a coprocessor of CPU to execute the hash encryption algorithm. This paper puts forward a CUDA-based design of the MD5 hash algorithm on GPU according to the specific application needs and presents its implementation as well as its comprehensive optimization in terms of the characteristics of GPU and CUDA.
Conference Paper
This paper describes a heterogeneous computer cluster called Axel. Axel contains a collection of nodes; each node can include multiple types of accelerators such as FPGAs (Field Programmable Gate Arrays) and GPUs (Graphics Processing Units). A Map-Reduce framework for the Axel cluster is presented which exploits spatial and temporal locality through different types of processing elements and communication channels. The Axel system enables the first demonstration of FPGAs, GPUs and CPUs running collaboratively for N-body simulation. Performance improvement from 4.4 times to 22.7 times has been achieved using our approach, which shows that the Axel system can combine the benefits of the specialization of FPGA, the parallelism of GPU, and the scalability of computer clusters.
Article
This paper presents an effective field-programmable gate array (FPGA)-based hardware implementation of a parallel key searching system for the brute-force attack on RC4 encryption. The design employs several novel key scheduling techniques to minimize the total number of cycles for each key search and uses on-chip memories of the FPGA to maximize the number of key searching units per chip. Based on the design, a total of 176 RC4 key searching units can be implemented in a single Xilinx XC2VP20-5 FPGA chip, which currently costs only a few hundred U.S. dollars. Operating at a 47-MHz clock rate, the design can achieve a key searching speed of 1.07 x 107 keys per second. Breaking a 40-bit RC4 encryption only requires around 28.5 h.
Effective uses of FPGAs for bruteforce attack on RC4 ciphers Very Large Scale Integration (VLSI) Systems
  • S Kwok
  • E Lam
S. Kwok and E. Lam, " Effective uses of FPGAs for bruteforce attack on RC4 ciphers, " Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 16, no. 8, pp. 1096–1100, 2008.