Content uploaded by Marcelo Kaihara

Author content

All content in this area was uploaded by Marcelo Kaihara on Jun 04, 2018

Content may be subject to copyright.

A preview of the PDF is not available

TheselectionoftheelementsofthebasesinanRNSmod- ular multiplication method is crucial and hasa great impact in the overall performance. This work proposes specific sets of optimal RNS moduli with elements of Hamming weight three whose inverses used in the MRS reconstruction have very small Hamming weight. This property is exploited in RNS bases conversions, to completely remove and replace theproductsbyfewadditions/subtractionsandshifts, reduc- ing the time complexity of modular multiplication. These bases are specially crafted to computation with operands of sizes 256 or more and are suitable for cryptographic appli- cations such as the ECC protocols.

Content uploaded by Marcelo Kaihara

Author content

All content in this area was uploaded by Marcelo Kaihara on Jun 04, 2018

Content may be subject to copyright.

A preview of the PDF is not available

... In this part we exhibit our results for the research of Solinas numbers: while every number can be written as x = x i 2 i where x i ∈ {−1, 0, 1}, Solinas numbers are such that the amount of x i = 0 is low, where w called the weight of x is the number of non-zero x i . [39] exhibited how to use those numbers efficiently: in particular, they showed how one can avoid using multiplications using those numbers and replace them by more efficient additions, shifts and substractions. However, [39] only exhibited RNS bases with 6 integers. ...

... [39] exhibited how to use those numbers efficiently: in particular, they showed how one can avoid using multiplications using those numbers and replace them by more efficient additions, shifts and substractions. However, [39] only exhibited RNS bases with 6 integers. ...

Residue Number Systems (RNS) are proven to be effective in speeding up computations involving additions and products. For these representations, there exists efficient modular reduction algorithms that can be used in the context of arithmetic over finite fields or modulo large numbers, especially when used in the context of cryptographic engineering. Their independence allows random draws of bases, which also makes it possible to protect against side-channel attacks, or even to detect them using redundancy. These systems are easily scalable, however the existence of large bases for some specific uses remains a difficult question. In this article, we present four techniques to extract RNS bases from specific sets of integers, giving better performance and flexibility to previous works in the litterature. While our techniques do not allow to solve efficiently every possible case, we provide techniques to provably and efficiently find the largest possible available RNS bases in several cases, improving the state-of-the-art on various works of the recent literature.

... ( Another way to perform the MRS-to-RNS conversion is to compute Equation 1.18 directly. For example, authors of [BKP09] proceed this way and prove that the computation at Equation 1.18 can be performed through shifts and additions/substractions if the selected moduli m i and m Õ j are pseudo-Mersenne numbers 2 w ≠ c with c of small Hamming weight. However, their method involves a lot of data dependencies between the channels, and hence hinders the ability to exploit the inherent parallelism of RNS. ...

Asymmetric cryptosystems are implemented in RNS using a quantity of hardware resources corresponding to the size of the cryptographic operands. In this thesis we propose a new approach to perform RNS implementations of asymmetric cryptosystems that leads to a flexible utilization of hardware resources. We start with describing a new method to perform base extensions which are crucial operations in RNS implementations of asymmetric cryptosystems. The proposed base-extension method, based on a hierarchical approach for computing the Chinese remainder theorem, introduces a reduction of the theoretical cost. Our FPGA implementations using HLS show an area and time gain compared with the state-of-the-art method. Then, we demonstrate the practicality of our new RNS-implementation approach on the two base-extension methods. Last, elliptic curve scalar multiplications based on the two base-extension methods are implemented using our RNS-implementation approach. Our FPGA implementations use a flexible quantity of hardware resources. Besides, although comparable with state-of-the-art ones in area vs. time trade-offs, most of our solutions are much smaller.

... As is typically implemented in literature and in order to avoid excessive operations unrelated to the scalar multiplication, values related only to RNS Bases moduli are precomputed and stored in memory for use in BE and random base permutation algorithms. Following the approach of [ESJ + 13] and [BKP09] a 4 moduli RNS bases (n = 4) RNS realization was used in all four RNS implementations. For all the above approaches, a GF (p) twisted Edwards EC was adopted with a = 1 and d = 2 where p = 2 192 − 2 64 − 1. Twisted Edwards Curves were chosen instead of Weierstrass ones since the first have never been tested under the RNS arithmetic framework. ...

The Residue Number System (RNS) arithmetic is gaining grounds in public key cryptography, because it offers fast, efficient and secure implementations over large prime fields or rings of integers. In this paper, we propose a generic, thorough and analytic evaluation approach for protected scalar multiplication implementations with RNS and traditional Side Channel Attack (SCA) countermeasures in an effort to assess the SCA resistance of RNS. This paper constitutes the first robust evaluation of RNS software for Elliptic Curve Cryptography against electromagnetic (EM) side-channel attacks. Four different countermeasures, namely scalar and point randomization, random base permutations and random moduli operation sequence, are implemented and evaluated using the Test Vector Leakage Assessment (TVLA) and template attacks. More specifically, variations of RNS-based Montgomery Powering Ladder scalar multiplication algorithms are evaluated on an ARM Cortex A8 processor using an EM probe for acquisition of the traces. We show experimentally and theoretically that new bounds should be put forward when TVLA evaluations on public key algorithms are performed. On the security of RNS, our data and location dependent template attacks show that even protected implementations are vulnerable to these attacks. A combination of RNS-based countermeasures is the best way to protect against side-channel leakage.

... The point is that the moduli should allow for simple modulo reductions (a typical implementation of the Euclidean algorithm to find the remainder of a division is not a realistic option). The designer has a lot of options and Residue arithmetic systems in cryptography 3 Fig. 1 General architecture of an RNS system many moduli sets for efficient modulo reduction have been proposed [7,67]. We review just one of them to illustrate how efficient arithmetic can be achieved. ...

In the last few years, the ancient residue number system has gained renewed scientific interest and has emerged as an interesting alternative in the field of secure hardware implementations. In this survey, however, we investigate some modern and non-typical applications of RNS in the areas of post-quantum cryptography, cloud infrastructures, and homomorphic encryption. We examine the techniques to incorporate residue arithmetic in these schemes as well as the means to mechanize secure and robust RNS cloud solutions. This survey serves, hopefully, as a soft introduction to residue arithmetic and provides insights for future research and open problems that could be addressed by RNS efficiently.

With rapid development and application of artificial intelligence and block chain, the requirement of information and data security is also increased, in which the public-key cryptography, such as Rivest-Shamir-Adleman (RSA) cryptography, plays a significant role. Modular exponentiation is fundamental in computer arithmetic and is widely applied in cryptography, such as ElGamal cryptography, Diffie–Hellman key exchange protocol, and RSA cryptography. The implementation of modular exponentiation in a residue number system leads to high parallelism in computation and has been applied in many hardware architectures. While most residue number system (RNS)-based architectures utilize RNS Montgomery algorithm with two residue number systems, the recent modular multiplication algorithm with sum residues performs modular reduction in only one residue number system with about the same parallelism. In this work, it is shown that the high-performance modular exponentiation and RSA cryptography can be implemented in RNS. Both the algorithm and architecture are improved to achieve high performance with extra area overheads, where a 1024-bit modular exponentiation can be completed in 0.567 ms in Xilinx XC6VLX195t-3 platform, costing 26 489 slices, 87 357 LUTs, 363 dedicated multipilers of
$18$
$\times$
$18$
bits, and 65 block RAMs.

This paper presents a new architecture to perform modular multiplication using Montgomery multiplication in Residue Number System (RNS). Two new balanced four-moduli RNS bases are selected, and the process of RNS Montgomery multiplication, including the required RNS to RNS conversions, is designed and implemented. Field-programmable gate array (FPGA) implementation using the adder-based residue/binary to binary/residue converters and ROM-free in the pipelined architecture has achieved better performance compared to the state-of-the-art works in literature.

Modular multiplication is used in a wide range of applications. Most of the existing modular multiplication algorithms in the literature often focus on large size moduli. However, those large moduli oriented modular multiplication solutions are also used to implement modular arithmetic for applications requiring modular arithmetic on moduli of size inferior to a word size i.e., 32/64bits. As it happens, a large majority of applications are using word size modular arithmetic. In this work, we propose a new modular multiplication designed to be computed on one word size only. For word size moduli, in a large majority of instances, our solution outperforms other existing solutions including generalist solutions like Montgomery's and Barrett's modular multiplication as well as classes of moduli like Mersenne, Pseudo-Mersenne, Montgomery-Friendly and Generalized Mersenne.

Hardware realization of public-key cryptosystems often entails Montgomery modular multiplication (MMM), which is more efficient in residue number systems (RNS). A large pool of co-prime moduli allows for higher number of dynamically changeable moduli-set pairs for the required base extension, leading to ultra-wide key-lengths to accommodate the indispensable resistance to differential power-analysis (DPA) attacks. The moduli are often of the form
${2^r} - {{\delta }}$
, where
$r$
denotes the width of residue channels. In a previous relevant RNS MMM design, with
$r\ = \ 64$
, probability of a successful DPA attack is less than
${2^{ - 66}}$
, where efficient arithmetic is obtained only for a limited set of moduli that are insufficient for key-lengths over 1024 bits. Here we propose a free-
${{\delta }}$
RNS MMM scheme, for up-to 8192-bit key-lengths and fast 16-bit residue channels, based on the proposed
${{\delta }}$
-independent modulo-(
${2^r} - {{\delta }}$
) adders and multipliers. Moreover, we propose an especial method for moduli selection that is required for base extension, leading to the same aforementioned DPA-resistance measure and much lower measures for key-lengths over 1024. The implementation results show
$82,69,44\ percent$
less RSA delay, for key-lengths
$512,1024,2048$
, respectively of the home designs versus the 512-bit main reference design, and more than
$5,100\ percent$
for
$4096,8192$
key-lengths, respectively, all per 512-bit encrypted messages.

In this paper we define steganography over Redundant Residue Number System (RRNS) Codes. We describe distortion-less RRNS based steganographic schemes, analyse their corresponding embedding capacities, discuss their linearity and compare them with well known steganographic protocols. Specifically, we take advantage of the redundancy and correction capacity of these codes to hide secret information in such way that the number of the altered residues does not exceed half the redundancy of the code.

In previous works (Cardarilli et al., 2000) we performed different experiments implementing FIR filtering structures. Each filter was implemented using both the two's complement system (TCS) and the residue number system (RNS) number representations. The comparison of these two implementations allows to conclude that, for these applications, the RNS uses less power than the TCS counterpart. The aim of the present paper is to highlight the reasons of this power consumption reduction.

In this paper we show how the usage of Residue Number Systems (RNS) can easily be turned into a natural defense against many side-channel attacks (SCA). We introduce a Leak Resistant Arithmetic (LRA), and present its capacities to defeat timing, power (SPA, DPA) and electromagnetic (EMA) attacks.

Modern Graphics Processing Units (GPU) have reached a dimension with respect to performance and gate count exceeding conventional Central Processing Units (CPU) by far. Many modern computer systems include – beside a CPU – such a powerful GPU which runs idle most of the time and might be used as cheap and instantly available co-processor for general purpose applications.
In this contribution, we focus on the efficient realisation of the computationally expensive operations in asymmetric cryptosystems on such off-the-shelf GPUs. More precisely, we present improved and novel implementations employing GPUs as accelerator for RSA and DSA cryptosystems as well as for Elliptic Curve Cryptography (ECC). Using a recent Nvidia 8800GTS graphics card, we are able to compute 813 modular exponentiations per second for RSA or DSA-based systems with 1024 bit integers. Moreover, our design for ECC over the prime field P-224 even achieves the throughput of 1412 point multiplications per second.

Let N > 1. We present a method for multiplying two integers (called N-residues) modulo N while avoiding division by N. N-residues are represented in a nonstandard way, so this method is useful only if several computations are done modulo one N. The addition and subtraction algorithms are unchanged. 1. Description. Some algorithms (1), (2), (4), (5) require extensive modular arith- metic. We propose a representation of residue classes so as to speed modular multiplication without affecting the modular addition and subtraction algorithms. Other recent algorithms for modular arithmetic appear in (3), (6). Fix N > 1. Define an A'-residue to be a residue class modulo N. Select a radix R coprime to N (possibly the machine word size or a power thereof) such that R > N and such that computations modulo R are inexpensive to process. Let R~l and N' be integers satisfying 0 N then return t - N else return t ■ To validate REDC, observe mN = TN'N = -Tmod R, so t is an integer. Also, tR = Tmod N so t = TR'X mod N. Thirdly, 0 < T + mN < RN + RN, so 0 < t < 2N. If R and N are large, then T + mN may exceed the largest double-precision value. One can circumvent this by adjusting m so -R < m < 0. Given two numbers x and y between 0 and N - 1 inclusive, let z = REDC(xy). Then z = (xy)R~x mod N, so (xR-l)(yR~x) = zRx mod N. Also, 0 < z < N, so z is the product of x and y in this representation. Other algorithms for operating on N-residues in this representation can be derived from the algorithms normally used. The addition algorithm is unchanged, since xR~x + yR~x = zR~x mod N if and only if x + y = z mod N. Also unchanged are

A generalization of a new generic 4-modulus base for residue number systems (RNS) is presented in this paper. An efficient RNS to binary conversion algorithm and a hierarchical architecture for these bases are also described. The key features of our conversion architecture compared to previous published architectures of the same output range are a larger moduli set selection and savings on the critical delay, area and power. The FPGA implementation and the detailed proof supporting it are also discussed.

An encryption method is presented with the novel property that publicly re- vealing an encryption key does not thereby reveal the corresponding decryption key. This has two important consequences: 1. Couriers or other secure means are not needed to transmit keys, since a message can be enciphered using an encryption key publicly revealed by the intended recipient. Only he can decipher the message, since only he knows the corresponding decryption key. 2. A message can be \signed" using a privately held decryption key. Anyone can verify this signature using the corresponding publicly revealed en- cryption key. Signatures cannot be forged, and a signer cannot later deny the validity of his signature. This has obvious applications in \electronic mail" and \electronic funds transfer" systems. A message is encrypted by representing it as a number M, raising M to a publicly specied