Fig 1 - uploaded by S. Pontarelli
Content may be subject to copyright.
Illustration of the parallel pipeline implementation.  

Illustration of the parallel pipeline implementation.  

Source publication
Article
Full-text available
Cuckoo hashing has proven to be an efficient option to implement exact matching in networking applications. It provides good memory utilization and deterministic worst case access time. The continuous increase in speed and complexity of networking devices creates a need for higher throughput exact matching in many applications. In this paper, a new...

Similar publications

Article
Full-text available
Security problems introduced with rapid increase in deployment of Internet-of-Things devices can be overcome only with lightweight cryptographic schemes and modules. A compact prime field (GF(p)) elliptic curve digital signature algorithm (ECDSA) engine suitable for use in such applications is presented. Generic architecture of the engine makes it...

Citations

... Concurrently, researchers have vigorously pursued parallel structures to bolster hash function efficiency. These endeavors encompass a multi-compression-based parallel iteration [30][31], a shuffle-exchange network-driven parallel structure [32], Chebyshev-Halley method-based parallel hashes with variable parameters [33], a novel chaotic system-inspired parallel hash [34], a coupled mapping lattice-based parallel function [35], a tree-patterned parallel hash leveraging internal variable input lengths [36], and the innovative parallel d-pipeline cuckoo hash algorithm [37]. While these parallel structures have undoubtedly contributed to varying degrees of efficiency gains, they also pose a delicate balance between efficiency and security. ...
Preprint
Full-text available
The cryptographic hash function stands as a cornerstone among the trio of essential cryptographic algorithms that are ubiquitously utilized across blockchain technology, digital signature applications, cloud storage solutions, and numerous other domains. Currently, a series of MD4-inspired hash functions, including RIPEMD, RIPEMD128, MD5, and SHA-1, have been critically evaluated and deemed insufficient in terms of security[10–13], thereby emphasizing the paramount importance of heightened vigilance towards safeguarding the integrity of cryptographic hash functions. Notably, the preponderance of prevalent hash functions relies heavily on inefficient serial architectures, posing limitations in terms of performance and scalability. To address these shortcomings, this paper introduces a groundbreaking cryptographic hash function, predicated on a parallel confusion and multi-compression structure (PCMCH). This innovative methodology innovatively fills the input data through a parallel confusion compression mechanism, concurrently executing multi-faceted confusion compression on each message block. Furthermore, it expedites message diffusion by meticulously tuning adaptable permutation parameters, enhancing both the speed and efficacy of the process. The exhaustive experimental analysis conducted underscores the exceptional security characteristics of the proposed hash function, including irregularity, the avalanche effect, high information entropy, and robust collision resistance. Moreover, its performance surpasses that of existing parallel hash functions, marking it as a promising contender that offers superior efficiency and security, thereby presenting a viable alternative for applications requiring heightened cryptographic safeguards.
... Therefore, researchers have focused on improving the efficiency of hash functions through parallel structures. Parallel hash iteration structures based on multi-compression [31], shuffle-exchange networks [32], the Chebyshev Halley method with variable parameters [33], new chaotic systems [34], coupled mapping lattices [35], internal variable input length [36], and cuckoo hash algorithms [37] have been designed to improve efficiency. All these parallel structures can enhance the efficiency of the algorithm to varying degrees, but they also have an inevitable impact on security [38]. ...
Preprint
Full-text available
The development of a cryptographic hash algorithm is a crucial task due to its numerous practical applications, such as digital signatures, blockchain, and distributed systems. Constructing a novel and efficient hash algorithm that meets the high security requirements is a challenging endeavor. This study introduces a cryptographic parallel hash algorithm based on cellular automata and a stochastic diffusion model, referred to as PCASD. The article delves into the rules of cellular automata, classifies 88 types of equivalent class rules, and utilizes random chaotic rules to generate keys for iterative processes. The stochastic diffusion model optimizes parameters to achieve optimal safety performance indicators. The parallel iteration structure allows for simultaneous execution of different branches, ultimately resulting in a hash value. The experimental results demonstrate that the proposed parallel hash algorithm outperforms popular hash functions in terms of randomness, avalanche, information entropy, collision resistance, and efficiency, indicating its practical feasibility.
... Although they are stall-free and relatively fast, the maximum number of items that can be monitored is limited, especially with larger key integer sizes. Several works have demonstrated that it is possible to expand the total number of monitored items using key-value store approaches based on hash tables that utilize the abundant embedded memory resources on an FPGA chip [15,16]. However, the performance of such approaches significantly suffers because of hash collisions and the pipeline stalls needed for updating the monitored items. ...
... Optimizing memory accesses and the RAM block geometry can also enhance the performance of a hash table. For example, breaking the RAM block into several pipelined banks allows for several concurrent memory accesses [16,33]. However, real data streams are typically skewed, causing contention on some of the memory banks and, as a result, preventing meaningful performance gains. ...
Article
Full-text available
This paper presents a novel approach for accelerating the top-k heavy hitters query in data streams using Field Programmable Gate Arrays (FPGAs). Current hardware acceleration approaches rely on the direct and strict mapping of software algorithms into hardware, limiting their performance and practicality due to the lack of hardware optimizations at an algorithmic level. The presented approach optimizes a well-known software algorithm by carefully relaxing some of its requirements to allow for the design of a practical and scalable hardware accelerator that outperforms current state-of-the-art accelerators while maintaining near-perfect accuracy. This paper details the design and implementation of an optimized FPGA accelerator specifically tailored for computing the top-k heavy hitters query in data streams. The presented accelerator is entirely specified at the C language level and is easily reproducible with High-Level Synthesis (HLS) tools. Implementation on Intel Arria 10 and Stratix 10 FPGAs using Intel HLS compiler showed promising results—outperforming prior state-of-the-art accelerators in terms of throughput and features.
... Paynter and Kocak [4] proposed a low power consumption with using the hash functions successively rather than parallel. Pontarelli et al. [5] proposed a parallel d-pipeline way to increase the throughput of Cuckoo Filter and it was easy to achieve on FPGA. Reviriego et al. [6] combined the Bloom Filter and Cuckoo Filter in a data structure to reduce the insertion times. ...
Article
Network traffic control and classification have become increasingly dependent on deep packet inspection (DPI) approaches, which are the most precise techniques for intrusion detection and prevention. However, the increasing traffic volumes and link speed exert considerable pressure on DPI techniques to process packets with high performance in restricted available memory. To overcome this problem, we proposed dual cuckoo filter (DCF) as a data structure based on cuckoo filter (CF). The CF can be extended to the parallel mode called parallel Cuckoo Filter (PCF). The proposed data structure employs an extra hash function to obtain two potential indices of entries. The DCF magnifies the superiority of the CF with no additional memory. Moreover, it can be extended to the parallel mode, resulting in a data structure referred to as parallel Dual Cuckoo filter (PDCF). The implementation results show that using the DCF and PDCF as identification tools in a DPI system results in time improvements of up to 2% and 30% over the CF and PCF, respectively.
... Designs allow free key-value pair insertions and removals with a constant average cost per operation. Several types of research have focused on "parallelizing" hash table implementations or using certain features of the hash table, such as the fact that it has many partitions [11]. ...
Preprint
Full-text available
A hash table is a basic data structure for searching and retrieving data quickly. It's an important part of Artificial Intelligence (AI)/Machine Learning (ML) applications and advanced graph analytics. An effective hash function can make hashing effective. Existing hash table implementations either simplify the underlying model by supporting only a subset of hash table operations or use optimizations that result in highly data-dependent performance and, at worst, comparable to a sequential implementation of the problem. A dynamic hash table that supports all hash table queries—search, insert, delete, and update—while also allowing us to provide p parallel queries (p > 1) per clock cycle using p processing engines (PEs) in the worst-case, i.e., data agnostic performance. Our design is scalable up to 128PEs and supports throughput of up to 8000 Million Operations per Second (MOPS) at 325 MHz using state-of-the-art FPGAs. It supports the same set of operations as the hash table and achieves speeds up to 42.1× speedup. Using RXOR-based parallel hash tables, we achieve high throughput, low latency, and low power consumption.
... It optimized existing pipeline structure to guarantee all message blocks were able to access in all cycles. This parallel implementation increased the throughput, so it was wellsuited to high speed applications (Salvatore et al. 2016). Meysam proposed a keyed parallel hashing scheme based on a new chaotic system. ...
... After 3000 independent tests, A c and R c are both quite close to the optimal values in all these hash functions. However, the standard deviations DR c of MCPH is much smaller than Je et al. (2015), Wang et al. (2011), Nouri et al. (2014, Salvatore et al. (2016), Yang et al. (2019b), Gauravaram et al. (2005), Liskov (2006) and slightly smaller than SHA1-160, SHA2-256 and SHA3-256, which implies MCPH has the ability to create more random and unpredictable hash value, it has stronger capabilities on confusion and diffusion. If an attacker wants to carry out statistical attack through analyzing statistical relationship of two hash values, MCPH has the best resistance among all hash functions listed in Table 4. ...
Article
Full-text available
As a fundamental cryptographic primitive, hash function is used in various cryptographic applications, such as cloud storage, digital signature, block chain and random number generation. Although hash function has made great achievements in recent years, most of the existing schemes are designed by serial architectures which suffer from huge time consumption. From an efficiency perspective, this paper discusses different forms of optimality that can be constructed by designing parallel structure in hash scheme. Moreover, without affecting the optimal efficiency, in order to accelerate message diffusion and avalanche effect, the corresponding iterative structure and compressive function are slightly optimized. The simulation results show the proposed hash scheme has great performance on both efficiency and security, comparing to other benchmark parallel hash functions, the proposed structure is increased 15% in efficiency, which can be used in substitution of classic hash functions.
... However, according to statistical analysis, its property of avalanche was poor [23]. The other compression study is the design of pioneering compression function [24][25][26]. A one-way hash function based on spatiotemporal chaos system was proposed by Ye [24]. ...
... A one-way hash function based on spatiotemporal chaos system was proposed by Ye [24]. Another hash function construction with changeable parameters was proposed by Salvatore [25]. Although compression function of these hash functions were redesigned, the length of the output hash value of [24,25] was only 128 bits, so they weren't able to perform well during birthday attack. ...
... Another hash function construction with changeable parameters was proposed by Salvatore [25]. Although compression function of these hash functions were redesigned, the length of the output hash value of [24,25] was only 128 bits, so they weren't able to perform well during birthday attack. A scheme of chaotic keyed hash function based on feedward-feedback nonlinear digital filter was proposed by Meysam, A., since this algorithm could not be analyzed linearly, it could resist cipher text search attacks [26]. ...
Article
Full-text available
Hash functions serve as a fundamental cryptographic primitives and are used in numerous security fields, such as cloud audit, digital signature, block chain and random number generation. Recent years, cryptographers have long delved into parallel hash functions to design more efficient cryptographic primitives. This paper proposes a multi-iterative parallel hash function. Moreover, inside this parallel structure, a four branch parallel compression structure is proposed to accelerate message diffusion. Simulation results show the proposed hash scheme has great performance on both efficiency and security.
... The reason is that the location for an element is determined by the modular result between the hash value and the table length. Consistent hashing [11] [31] relaxes this situation such that only a small part of the stored elements will be moved when resizing the hash table, with the assumption that a bucket can accommodate multiple elements. Consistent hashing maps both the elements and the buckets in the hash table into a ring ranged from 0 to a given large integer M. Thereafter, the elements are assigned to the buckets with either clockwise or anti-clockwise order in the ring. ...
Article
Full-text available
The emergence of large-scale dynamic sets in networked and distributed applications attaches stringent requirements to approximate set representation. The existing data structures (including Bloom filter, Cuckoo filter, and their variants) preserve a tight dependency between the cells or buckets for an element and the lengths of the filters. This dependency, however, degrades the capacity elasticity, space efficiency and design flexibility of these data structures when representing dynamic sets. In this paper, we first propose the Index-Independent Cuckoo filter (I2CF), a probabilistic data structure that decouples the dependency between the length of the filter and the indices of buckets which store the information of elements. At its core, an I2CF maintains a consistent hash ring to assign buckets to the elements and generalizes the Cuckoo filter by providing optional k candidate buckets to each element. By adding and removing buckets adaptively, I2CF supports the bucket-level capacity alteration for dynamic set representation. Moreover, in case of a sudden increase or decrease of set cardinality, we further organize multiple I2CFs as a Consistent Cuckoo filter (CCF) to provide the filter-level capacity elasticity. By adding untapped I2CFs or merging under-utilized I2CFs, CCF is capable of resizing its capacity instantly. The trace-driven experiments indicate that CCF outperforms its alternatives and realizes our design rationales for dynamic set representation simultaneously, at the cost of a little higher complexity.
... Several works have developed high-throughput hash table implementations by "parallelizing" the hash table. The "parallelization" in these works implies exploiting certain features of the hash table, such as the availability of multiple partitions [22], to increase the number of parallel queries that can be supported. However, this does not imply true parallelism as the parallelism is highly data dependent and the worst case performance -for example, when all queries belong to the same partition -is similar to a serial implementation. ...
... However, their study is limited to lookups, and the complexity incurred by simultaneous lookups and insertions is not considered. Table Implementation [22]. To increase throughput, each pipeline has a different entry point, each of which corresponds to a different hash function. ...
... The authors report the performance for bulk build (static) and incremental inserts (dynamic) separately. We used 32bit key/value size and random traffic pattern in the comparison, which is the same as reported by [1,22]. Proposes a parallel Cuckoo hashing on FPGA. ...
Chapter
Hash table is a fundamental data structure that provides efficient data store and access. It is a key component in AI applications which rely on building a model of the environment using observations and performing lookups on the model for newer observations. In this work, we develop FASTHash, a “truly” high throughput parallel hash table implementation using FPGA on-chip SRAM. Contrary to state-of-the-art hash table implementations on CPU, GPU, and FPGA, the parallelism in our design is data independent, allowing us to support p parallel queries (p>1p>1) per clock cycle via p processing engines (PEs) in the worst case. Our novel data organization and query flow techniques allow full utilization of abundant low latency on-chip SRAM and enable conflict free concurrent insertions. Our hash table ensures relaxed eventual consistency - inserts from a PE are visible to all PEs with some latency. We provide theoretical worst case bound on the number of erroneous queries (true negative search, duplicate inserts) due to relaxed eventual consistency. We customize our design to implement both static and dynamic hash tables on state-of-the-art FPGA devices. Our implementations are scalable to 16 PEs and support throughput as high as 5360 million operations per second with PEs running at 335 MHz for static hashing and 4480 million operations per second with PEs running at 280 MHz for dynamic hashing. They outperform state-of-the-art implementations by 5.7x and 8.7x respectively.
... Nouri et al. (2014) proposed a parallel iterative structure containing different parameters automatically acquired from position index of corresponding message blocks based on Chebyshev-Halley method. Salvatore et al. (2016) proposed a parallel hash function construction with changeable parameters. All these parallel hash schemes can process many message blocks simultaneously. ...
... For hash functions in Nouri et al. (2014), Ghebleh (2015, Ye et al. (2016), Salvatore et al. (2016) and Yang et al. (2015), the maximum and minimum D hash can be obtained using the same random message, and average . As shown in Table 7, the fluctuation of PLHF is small, which means it is more stable and has stronger resistance against random collision attack than other algorithms with n 256. ...
... As shown in Table 7, the fluctuation of PLHF is small, which means it is more stable and has stronger resistance against random collision attack than other algorithms with n 256. Its security parameter, D average(ave) , is approximately equal to the optimal value 85.33, which is superior to other parallel schemes such as Je et al. (2015), Wong (2011), Nouri et al. (2014) and Salvatore et al. (2016). ...
Article
Full-text available
Rapid development of cloud computing has created enormous security challenges on authenticity, integrity, availability and reliability of outsourced data. Cloud audit is an effective solution for massive data verification and provides reliable and credible authentication results. High efficiency of audit is needed because real-time verification of data is necessary for most of applications on cloud. Since hashing operation is an essential function in audit scheme which occupies most of audit overhead, this paper proposes a parallel iterative structure and a message padding procedure to construct a novel parallel lattice hash function (PLHF). Moreover, inside the parallel iterative structure, a lattice-based hash compression function is proposed to reduce the hardness of PLHF cracking into the solution of shortest vector problem. Based on experimental results and security analysis, the cloud audit scheme with PLHF does not only perform significant higher efficiency, but also has stronger security.