Article

A signed binary multiplication technique

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

A technique is described whereby binary numbers of either sign may be multiplied together by a uniform process which is independent of any foreknowledge of the signs of these numbers.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... A novel multiplier organization is introduced, in which the data bits flow in one direction, and the Booth commands [1] are piggybacked on the acknowledgments flowing in the opposite direction. ...
... An architectural optimization is introduced that merges the arithmetic operations and the shift operation into the same function unit, thereby obtaining significant improvement in area, energy and speed [1]. ...
... They are often more costeffective (and less risky) than custom hardware, particularly for low-volume applications, where the development cost of custom ICs [13] may be prohibitive. And in comparison to other types of microprocessors, DSP processors often have an advantage in terms of speed, cost, and energy efficiency [1]. ...
Article
Full-text available
This paper presents the design and implementation of signed-unsigned Modified Booth Encoding (SUMBE) multiplier. The present Modified Booth Encoding (MBE) multiplier and the Baugh-Wooley multiplier perform multiplication operation on signed numbers only. Therefore, this paper presents the design and implementation of SUMBE multiplier. The modified Booth Encoder circuit generates half the partial products in parallel. By extending sign bit of the operands and generating an additional partial product the SUMBE multiplier is obtained. The Carry Save Adder (CSA) tree and the final Carry Look ahead (CLA) adder used to speed up the multiplier operation. Since signed and unsigned multiplication operation is performed by the same multiplier unit the required hardware and the chip area reduces and this in turn reduces power dissipation and cost of a system. The proposed radix-2 modified Booth algorithm MAC with SPST gives a factor of 5 less delay and 7% less power consumption as compared to array MAC.
... Generally, multipliers could be classified into various types, such as array [11,12], Booth [13,14], carry-save, and Wallace tree [15,16], according to the methods used to produce, pass, and compress the partial products. In an array multiplier, the partial product is generated by the one-bit multiplication of the multiplicand and multiplier, mostly conducted by AND gates. ...
... Booth encoder methods [13], on the other hand, encode the input sequence according to a certain concept. An improved version of Booth encoding, known as modified Booth encoding (MBE), was proposed [14]. ...
... Table 10 illustrates the radix-4 MBE pattern, where the multiplicand is encoded in groups of 3 bits. The modified Booth encoder methods and Wallace tree combine to form the modified Booth Wallace tree (MBW) [13,32,33]. Table 10. ...
Article
Full-text available
With the rapid development of information technology, the demand for high-speed and low-power technology for digital signal processing is increasing. Full adders and multipliers are the basic components of signal processing technology. Pass-transistor logic is a promising method for implementing full adder and multiplier circuits due to the low count of transistors and low-power characteristics. In this paper, we present a novel full adder based on pass transistors. The proposed full adder consists of 18 transistors. The post-layout simulation shows a 13.78% of power reduction compared to conventional CMOS full adders. Moreover, we propose an 8-bit signed multiplier based on the proposed full adder. The post-layout simulation shows an 8% power reduction compared to the multiplier produced by the Design Compiler synthesis tool. Compared to the existing work with a similar process, our work achieved only 19.02% of the power-delay product and 3.5% of the area-power product.
... The representation of was proposed in 1951by( booth). [3]. And after 10 years (George w. ...
... The hamming weight of the ( ) is minimal among all signed digit representations of [28]. Fortunately, the number of bits in the ( ) is at most one more than the number of bits in the binary form of k. calculating the ( ) of any integer will illustrate in Algorithm 2 [3]. 2. If ≡ 1( 4), then 1 will be taken, and continue with −1 2 which is an even integer that guarantees a 0 in the next step. ...
Article
Full-text available
Scalar multiplication is the fundamental operation in the elliptic curve cryptosystem. It involves calculating the integer multiple of a specific elliptic curve point. It involves three levels: field, point, and scalar arithmetic. Scalar multiplication will be significantly more efficient overall if the final level is improved. By reducing the hamming weight or the number of operations in the scalar representation, one can raise the level of scalar arithmetic. This paper reviews some of the algorithms and techniques that improve the elliptic curve scalar multiplication in terms of the third level.
... Posit multiplication can be accomplished by using any generalized multiplication algorithm like Booth [10], shift and add, array etc. When 2 posits are to be multiplied, an XOR operation is performed on the sign bits to produce the resultant sign bit, the eective exponents (formed by the regime and exponent elds) are added, and the mantissa bits (along with the implied bit) are multiplied to produce the resultant exponent and mantissa bits respectively. ...
... For a uniform comparison between proposed designs and existing works, Verilog HDL has been utilized. Hardware synthesis results of existing designs E1 [8], E2 [10], and proposed designs (P1 and P2) performed using Cadence RTL compiler v7.1 E2 at TSMC 90 nm process node (slow-normal library) have been shown in Tables 3,4,5 and 6. Two regime sizes-2 and 4 have been selected for comparison of the hardware synthesis results of the proposed and existing designs to highlight the advantages oered by the proposed designs. ...
... The purpose of MBA Radix-2 is to reduce the partial product of the multiplication operation and the MBA Radix-4 is used to cut the partial product of MBA Radix-2 by half (Kaur, S. & Manna, M.S. 2013). Booth algorithm was created by A.D Booth (Booth, A.D. 1951) forms the signed number's base multiplication algorithm that are simple to fulfill at hardware level and have potential to speed up signed multiplication (Booth, A.D. 1951). Booth's algorithm contained the multiplier y, value z, leaving multiplicand. ...
... The purpose of MBA Radix-2 is to reduce the partial product of the multiplication operation and the MBA Radix-4 is used to cut the partial product of MBA Radix-2 by half (Kaur, S. & Manna, M.S. 2013). Booth algorithm was created by A.D Booth (Booth, A.D. 1951) forms the signed number's base multiplication algorithm that are simple to fulfill at hardware level and have potential to speed up signed multiplication (Booth, A.D. 1951). Booth's algorithm contained the multiplier y, value z, leaving multiplicand. ...
Article
This paper presentsthe performance of Radix-4 Modified Booth Algorithm. Booth algorithm is a multiplication algorithm that multiplies two signed binary numbers in two's complement notation. Multiplier is a fundamental component in general-purpose microprocessors and in digital signal processors. With advances in technology, researchers design multipliers which offer high speed, low power, and less area implementation. Booth multiplier algorithm is designed to reduce number of partial products as compared to conventional multiplier. The proposed design is simulated by using Verilog HDL in Quartus II and implemented in Cyclone II FPGA. The result shows that the average output delay is 20.78 ns. The whole design has been verified by gate level simulation.
... The first section discussed about the scope of multipliers and types of multipliers and in the second section discussed about pipeline concept. To achieve higher throughput in arithmetic operations is important to achieve the desired performance in many real-time applications [1]. In the recent days the multiplication circuit plays an important role in evaluating the performance of the computer. ...
... In the recent days the multiplication circuit plays an important role in evaluating the performance of the computer. Along with this designing a fast multiplier is also a key constraint to the developers.But this is highly essential to develop fast multipliers with less time delay, power consumptionand high throughput [1,2,5]. ...
Article
Full-text available
In many digital computers multipliers plays vital role to improve the performance of the system. The speed of the processor greatly depends on high speed multipliers. On the other hand pipeline technology plays an important role in present parallel computers in improving the speed of the computer. In the present paper high speedVedic multiplier is designed and analysed by inserting a pipeline in the process of computation.The computation speed is considerably improved with pipeline when compared with conventional Vedic multiplier. The proposed pipelined Vedic multiplier was implemented in TSMC 65nm CMOS Technology consumes 57mWatts power when operated at 100MHz.
... The main frequency of the microprocessor is determined by the frequency at which the multiplier completes a set of data calculations [2]. In 1951 A.D. Booth proposed the method of Booth coding [3], and in 1961 O.L.Mcsorley improved the Booth algorithm, also known as the Radix-4 Booth algorithm. This method reduces the number of partial products by half. ...
Article
Full-text available
Among the many microprocessors, RISC-V as an open-source instruction set is gradually gaining popularity among academia and industry. The performance of the multiplier in the microprocessor imposes constraints on the computational power of the processor. In order to improve the efficiency of multiplication instructions, a pipeline multiplier is implemented in this paper. Firstly, the partial product is generated using the Radix-4 booth. Secondly, the Wallace tree structure is used to accelerate the compression of the partial product. Then, a parallel prefix adder is used to calculate the resulting partial product to improve the timing. Finally, registers are added as a pipeline to achieve a high-efficiency multiplication calculation. With the operating voltage and temperature set to typical conditions, the integrated multiplier area is 50260.6 μm ² , and the power consumption is 20.41 mW. The final frequency of the multiplier is 1 GHz in gate-level simulation.
... Some known and mostly used methods are the binary and the signed binary representation as complementary recoding [13][14][15], NAF [16][17][18][19], MOF [20], and DRM [21]. Two operations determine the speed of ECSM; they are the elliptic curve adding (ECADD) and doubling (ECDBL) operations. ...
... Therefore, it is important to speed up the multiplier to improve the overall hardware performance. The booth multiplication algorithm is a multiplication algorithm faster than general multiplication methods [30]. General multiplication operations deal with unsigned numbers and partial product operations increase by the number of bits of the multiplier or by the number of 1s present in the multiplier. ...
... However, these designs employed multiple processing units for the generation, reduction, and addition of possible partial products. A proposed novel modification in the standard R2IM algorithm is presented in algorithm 5. Two modifications based on Montgomery laddering (ML) [50] and Booth encoding (BE) [51] in combination with radix-4 are proposed. Note that our modifications to the algorithm involve constructing detailed dataflow graphs to examine the relationship between critical operations and eliminate redundant operations, thus introducing parallelism at the expense of lower hardware cost. ...
Article
Full-text available
Elliptic Curve Cryptography (ECC) based security protocols require much smaller key space which makes ECC the most suitable option for resource-constrained devices as compared to the other public key cryptography (PKC) schemes. This paper presents a highly efficient area-delay optimized ECC crypto processor over the general prime field (F <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> p </sub> ). It is based on a new novel finite field multiplier (FFM) where several optimization techniques have been incorporated to reduce the latency and hardware resource consumption. The proposed FFM architecture is embedded with a finite field adder/subtractor (FFAS) unit which is utilized to perform FFAS operations instead of deploying a dedicated unit. The Common Z (Co-Z) coordinates with the Montgomery ladder algorithm are adopted to perform point multiplication, a core operation in all ECC-based crypto protocols. The work also proposes an efficient scheduling strategy to execute low-level finite field arithmetic primitives with minimum latency on the employed finite field arithmetic units. Due to these techniques, the proposed ECC processor is optimized for hardware resources, latency, and throughput. It is captured in Verilog-HDL, synthesized, and implemented on Virtex-7, Kintex-7, and Virtex-6 FPGA platforms using Xilinx Vivado and ISE Design Suite tools. On the Virtex-7 FPGA platform, it performs a 256-bit point multiplication operation in 0.7 m s, consumes just 6.2K slices, and delivers a throughput of 1428 operations per second. The implementation results show that it is a highly efficient design outperforming the state-of-the-art by providing a better area-delay product and higher efficiency. Therefore, it has the potential to be deployed in many applications where both latency and resource requirements are critical.
... In 1951, Booth suggested an integer representation utilizing the numerals −1, 0, and 1 (Booth, 1951). This proposal has proved that Rietweisner could uniquely represent every integer in 1960 (Reitwiesner, 1960). ...
Article
Full-text available
WhatsApp is one of the world’s most widely used messaging applications, with more than 2.44 billion monthly active users worldwide as of 2022. To ensure the security and privacy of its users’ communications, WhatsApp uses end-to-end encryption (E2EE) to protect messages and calls from unauthorized access, including WhatsApp itself. However, the E2EE can be computationally intensive and pose challenges for low-end devices or slow network connections. This paper proposes a lightweight E2EE protocol optimized for mobile devices that reduces energy consumption and computational overhead while maintaining the same security level.
... Based on the polarization of the polarizer, the control light path is divided into two areas (H, V). According to Decrease-Radix Design theory, when constructing a specific logic operator, first we should select the pixels of each type of main optical path, then give the values of the reconstruction instruction bits k1, k2 and k3 for these pixels [35]. After the computational task is completed, the logic operator will release all the tiny SBUs that are occupied, in order to reuse them next time. ...
Article
Full-text available
Ternary Optical Computer (TOC) is more advanced than traditional computer systems in parallel computing, which is characterized by huge amounts of repeated computations. However, the application of the TOC is still limited because of lack of key theories and technologies. In order to make the TOC applicable and advantageous, this paper systematically elaborates the key theories and technologies of parallel computing for the TOC through a programming platform, including reconfigurability and groupable usability of optical processor bits, parallel carry-free optical adder and the TOC's application characteristics, communication file to express user's needs and data organization method of the TOC. Finally, experiments are carried out to show the effectiveness of the present theories and technologies for parallel computing, as well as the feasibility of the implementation method of the programming platform. For a special instance, it is shown that the clock cycle on the TOC is only 0.26% of on a traditional computer, and the computing resource spent on the TOC is 25% of that on a traditional computer. Based on the study of the TOC in this paper, more complex parallel computing can be realized in the future.
... Consider an M × N-bit multiplier, where X = x M−1 x M−2 · · · x 1 x 0 and Y = y N−1 y N−2 · · · y 1 y 0 are the multiplier and multiplicand, respectively. The N rows of the M-bit partial products PP i,j can be expressed as: Fast multipliers employ the modified Booth encoding algorithm to reduce the height of the partial products [22,23]. By using the radix-R = 2 r , r > 0, booth multiplier, partial products are reduced from N to (N+1) r , thus reducing the size and speed of the reduction tree. ...
Article
Full-text available
Systolic arrays are an integral part of many modern machine learning (ML) accelerators due to their efficiency in performing matrix multiplication that is a key primitive in modern ML models. Current state-of-the-art in systolic array-based accelerators mainly target area and delay optimizations with power optimization being considered as a secondary target. Very few accelerator designs directly target power optimizations and that too using very complex algorithmic modifications that in turn result in a compromise in the area or delay performance. We present a novel Power-Intent Systolic Array (PI-SA) that is based on the fine-grained power gating of the multiplication and accumulation (MAC) block multiplier inside the processing element of the systolic array, which reduces the design power consumption quite significantly, but with an additional delay cost. To offset the delay cost, we introduce a modified decomposition multiplier to obtain smaller reduction tree and to further improve area and delay, we also replace the carry propagation adder with a carry save adder inside each sub-multiplier. Comparison of the proposed design with the baseline Gemmini naive systolic array design and its variant, i.e., a conventional systolic array design, exhibits a delay reduction of up to 6%, an area improvement of up to 32% and a power reduction of up to 57% for varying accumulator bit-widths.
... In the present scenario designing fast multiplier is key constraint in evaluating performance of a digital system. To achieve a fast and efficient multiplier it is essential to design a multiplier with lower power consumption, less area and good throughput [1][2] [3]. The principles and various techniques involved in Vedic Multiplier are discussed and in the present paper the principle UrdhvaTiryagbyam [2] [4] (vertically and crosswise) is adopted. ...
Article
Full-text available
In recent days advanced digital process demands more sophisticated parameters such as throughput, power and area. It is very difficult to maintain high throughput while maintaining optimum power consumption and cell area. In most of the digital systems multipliers are deciding their performance in terms of above parameters. In the present work high speed Vedic multipliers are designed with pipeline technology. As the MAC speed is decided by Vedic Multiplier, in the present paper Multiplier and Accumulator (MAC) is designed with two way pipeline technology to meet high throughput. Vedic Multipliers are used in designing MAC unit as they are fast multipliers and further enhancing the data speed. The MAC is implemented with Cadence Encounter(R) RTL Compiler.
... The multiplierless shift-and-add approach to solve the MCM problem finds its first solution from the binary representation of the constant which is literally a shift-and-add decomposition of integers. A greedy algorithm based on the Canonical Signed Digit (CSD) representation permits to reduce the number of adders [7], [8] compared to the binary representation. However, this number can be further reduced and heuristics have been proposed to enhance the results obtained with the CSD method [9]- [13]. ...
Article
Full-text available
Multiple Constant Multiplication (MCM) over integers is a frequent operation arising in embedded systems that require highly optimized hardware. An efficient way is to replace costly generic multiplication by bit-shifts and additions, i.e. a multiplierless circuit. In this work, we improve the state-of-the-art optimal approach for MCM, based on Integer Linear Programming (ILP). We introduce a new low-level hardware cost metric, which counts the number of one-bit adders and demonstrate that it is strongly correlated with the LUT count. This new model permitted us to consider intermediate truncations that permit to significantly save resources when a full output precision is not required. We incorporate the error propagation rules into our ILP model to guarantee a user-given error bound on the MCM results. The proposed ILP models for multiple flavors of MCM are implemented as an open-source tool and, combined with an automatic code generator, provide a complete coefficient-to-VHDL flow. We evaluate our models in extensive experiments, and propose an in-depth analysis of the impact that design metrics have on synthesized hardware.
Article
Electronic systems are widely used by humans nowadays in all aspects of daily life. Today, there is no living for humans on Earth without any electronic products. High Speed, Low Power, and Low Area Electronic Systems are what the current generation needs. In digital systems, a variety of arithmetic circuits are employed. Adder, multiplier, divider, and other arithmetic circuits are some examples. To acquire Products from Multiplier and Multiplicand, there are various multipliers with various methods. One of the multipliers is Radix-4 Multiplier. The Radix-4 Multiplier produces n/2 partial products, where n is the multiplier's bit count. This multiplier has a high operating speed, power dissipation, and surface area. Area, power dissipation, and propagation delay can all be reduced by reducing the number of partial products of the n-bit multiplier. Radix-8 uses n-bit multiplier integers that are n/3 for partial products. The Area, Delay, and Power Dissipation are reduced as a result. 8- bit booth multipliers for Radix-4 and Radix-8 are designed and implemented using FPGA. For both multipliers, delay, power dissipation, and area are compared. According to the comparison, Radix8 Booth Multiplier performs better than Radix-4 Booth Multiplier in terms of delay, power dissipation, and area. Therefore, the Radix-4 Booth Multiplier can be swapped out for the Radix-8 Booth Multiplier.
Article
A multiplier is a key component in arithmetic and logical units. So low power consuming, area efficient speedy multiplier architecture is need of today’s Arithmetic and Logic Unit. In the ancient Indian Vedic mathematics, the Urdhvatiryagbhyam sutra gives a multiplication methodology. Vedic multiplier improves the performance parameters of the digital circuit logically as well as physically. The performance of the multiplier depends upon a reduction in partial product generation. This is achieved with the help of a 4:2 compressor. In this paper, a logically improved, area-optimized 4:2 compressor is proposed. The proposed design is implemented in standard CMOS 45 nm technology and analyzed the results post layout design. For performance analysis, the novel proposed 4:2 compressor is used in a Vedic multiplier. After simulation, we get results which shows that the proposed compressor has 62.95% of power reduction and 53.31% delay reduction compared to the conventional compressor. The compressor shows a 44.34–10.57% reduction in ADP and 149.84–12.59% reduction in PDP against present variants. The proposed 4 Bit Vedic multiplier shows a 9.633% power reduction compared to the Vedic multiplier with a conventional compressor. The 4 Bit Vedic multiplier shows a 44.877–7.14% reduction in Area-Delay product and 27.99–0.22% reduction in Power-Delay product against present variants.
Article
The increasing speed of computer processors with each passing day has required the design of arithmetic circuits to be verified as high performance. For this reason; by being observed the computer arithmetic, it enabled faster algorithms to come out and verifications of hardware in terms of the facilities that technology provides. The main aim of the computer arithmetic is the design of the circuits and algorithms that will increase the speed of the numerical process. To this end, the design of arithmetic multiplication circuits with a faster and higher bit length is presented through the efficient bit reduction method in this paper. The developed fast and efficient algorithms for arithmetic multiplication process by using the efficient bit reduction method have been observed in this work. By making changes in some multiplication methods that are based on Vedic math’s, the higher bit length circuits of multiplication circuits in the literature which are 4 bits have been developed by using some basic properties of multiplication like decomposition and bit shifting. Analysis of arithmetic circuits is implemented by verifying functionally with VHDL simulations, getting output signal waveform and measurements of delay time. All the circuits of hardware that are observed have been described via VHDL and the performances of multiplication circuits that are synthesized have been presented via FPGA.
Chapter
While ASIC-based hardware platforms provide better application-specific cost–accuracy trade-offs, the diversity of embedded systems deploying machine learning algorithms has risen steadily. Consequently, given their reconfigurability and high performance, FPGA-based hardware platforms are increasingly used for embedded machine learning. However, the low-power designs devised for ASICs, using methods such as precision scaling, approximate computing, and mixed/custom quantization, do not result in proportionate gains when implemented on FPGAs. This lack of proportional gains can be attributed primarily to the lack of optimizations for FPGA’s LUT-based architecture in the ASIC-optimized designs. Consequently, there has been active research on improving the efficacy of low-power methods in FPGA-based systems. In this chapter, we provide an overview of such FPGA-oriented low-power design methods and delve into the details of selected works that report considerable improvements in this regard. Specifically, we cover custom optimizations for both accurate and approximate multiplier designs and MAC units employing mixed quantization of Posit and fixed-point/integer number representations.
Article
Full-text available
A universal rewrite system (URS) is proposed as the driver of all physical processes at all levels. Beginning with a realisation of its manifestation in algebra, we proceed to show its significance in computation theory and category theory. This includes the simplest possible manifestation of the URS as a computational system. By coincidence the use of seven lettered components in the computation realisation of the basic unit used in physics and other applications (order 64) can be linked to the standard musical scale.
Chapter
The extensive instruction-set for deep learning (DL) significantly enhances the performance of general-purpose architectures by exploiting data-level parallelism. However, it is challenging to design arithmetic units capable of performing parallel operations on a wide range of formats to perform DL instructions (DLIs) efficiently. This paper presents a multi-level parallel arithmetic architecture capable of supporting intra- and inter-operation parallelism for integer and a wide range of FP formats. For intra-operation parallelism, the proposed architecture supports multi-term dot-product for integer, half-precision, and BrainFloat16 formats using mixed-precision methods. For inter-operation parallelism, a dual-path execution is enabled to perform integer dot-product and single-precision (SP) addition in parallel. Moreover, the architecture supports the commonly used fused multiply-add (FMA) operations in general-purpose architectures. The proposed architecture strictly adheres to the computing requirements of DLIs and can efficiently implement them. When using benchmarked DNN inference applications where both integer and FP formats are needed, the proposed architecture can significantly improve performance by up to 15.7% compared to a single-path implementation. Furthermore, compared with state-of-the-art designs, the proposed architecture achieves higher energy efficiency and works more efficiently in implementing DLIs.KeywordsDeep Learning instructiondata-level parallelismArithmetic ArchitectureDot-ProductMixed-PrecisionInter- and Intra-operation Parallelism
Article
The Earth’s infrared energy storage is huge, and if it can be used on a large scale, it will effectively improve the Earth’s greenhouse effect . In this paper, using water as a good energy storage vector,a long wave energy storage system (LWESS) was designed, which can radiate long wave infrared with huge energy density and fixed bandwidth or fixed wavelength. 1. The feasibility of using hollow glass and infrared fiber to deliver LWESS for cooling and heating is presented. 2. On the basis of literature, principle of the system achieving thermoelectric power generation is described; 3. Using antenna theory to derive the composition structure of planar antenna and propose the selection requirements of rectifier diode, which provides the theoretical and practical basis for converting long-wave radiation using Rectenna; Finally, the Quantum theory of long-wave radiation is used to propose the idea of manufacturing long-wave radiation module which can directly convert energy, and the long-wave energy storage system has unique advantages and can quickly realize the harmony between human and nature.
Article
Elliptic curve point multiplication is the main primitive required in almost all security schemes using elliptic curve cryptography (ECC). It is the leading computationally intensive operation that sets the overall performance of the associated cryptosystem. This work presents a highly novel area–time efficient elliptic curve point multiplier over a general prime field . It is based on an efficient radix‐2 ³ parallel multiplier, which performs a ‐bit multiplication in clock cycles. On the system level, the twisted Edwards curves with unified point addition using projective coordinates are adopted, where an efficient scheduling technique is presented to schedule several operations on deployed modular arithmetic units. Due to the introduced optimization at different stages of the design, latency, hardware resource requirement, and total clock cycle count are reduced significantly. Synthesis, and implementation of the proposed design over different Xilinx FPGA platforms are completed using the Xilinx ISE Design Suite tool for key sizes of 192, 224, and 256 bits. The 256‐bit Xilinx Virtex‐7 FPGA implementation reveals that it completes a single point multiplication operation in 0.8 ms and occupies 6.7K FPGA slices in a clock cycle count of 132.2K. It produces significantly better area–time product and throughput per slice than the contemporary designs. The proposed design also has the potential to counter simple power analysis and timing attacks. Thus, it is an elegant solution to develop ECC‐based cryptosystems for applications, where both speed and hardware resource consumption are important.
Chapter
The multiplier is the core part of the processor and embedded systems. Multiplication is intensive on hardware, and the criteria of interest are higher performance and low-power arithmetic structure of the processor. If the power consumed by the multipliers is reduced, the overall power consumption in the processor is reduced drastically. The projected multiplier circuit comprises a partial product generator (PPG) and adder trees. The design of PPG and adder tree can be optimized separately. This work proposes a novel Radix-4 Booth Multiplier PPG technique using Pass Transistor Logic. The method using PTL helps reduce the transistor count and the delay in the circuit. The design uses a pre-encoding phase which switches off the encoder in zero state to save power. The gates are modified to include CMOS inverters in intermediate design parts, which helps level the circuit and reduce the noise from the input. The parameters compared are the delay in generating the partial product power consumption by the partial product generator and the power delay product of the output signal.KeywordsMultiplierBooth encodingPartial Product Generator (PPG)Low power designPass Transistor Logic (PTL)Level restoration
Chapter
Side-channel attacks (SCA) enable attackers to gain access to non-disclosed information by measuring emissions of a system, e.g., timing, electromagnetic waves or power consumption. The emissions of a system can typically only be measured on the final system. As a consequence, the analysis of such security threats is often only possible at a very late stage in the development process. In this paper, we present an approach to simulate timing attacks in early stages of the development process with SystemC and discuss the potentials and limitations of this approach. Our results show that the simulation of SCA in SystemC is generally possible, but currently difficult due to an explanation gap. It is, to the best of our knowledge, not well understood where the causal connection between physical quantities and data, which is exploited in SCA, comes from. This poses a major challenge for the design of precise models that accurately reflect physical insights for early security analysis.KeywordsSide-Channel AttacksModeling and SimulationSystemC
Conference Paper
Full-text available
For various applications such as military & banking sector Wireless Sensor Networks (WSNs) are playing primary role in rising invasive platforms. It may have various malicious attacks on sensor network. It is necessary to prevent sensor network from these attacks for security purpose. This paper shows overview of WSN, intrusion detection in WSN, type of intrusion detection methodology and comparative scrutiny of existing method. This paper proposed hybrid intrusion detection system (HIDS) for cluster WSN.
Chapter
Full-text available
The multiple valued logic (MVL) multiplier plays an important role in today's arithmetic models, signal processing, FIR filters, big data processing and many other applications. Owing to technological advances, several researchers have been trying and preparing the development of the multipliers that set one of the final technical parameters: high speed, low power consumption, the regularity of layout, thus and so less area. Integrated Circuit Based Simulation Cadence software used to implement a multiplier. The simulation results have shown that the development is more effective than the binary multiplier and the circuit will reach a high speed, become smaller in size and have a minimum number of transistors. The principles of Multiple-Valued Logic were then shown in this paper for design of multiple valued octal logic multipliers. The most important factor is the implementation of MVL circuits which is superior to the binary valued circuits. When decreased wire complexity compared to binary circuits, using a single wire to transmit rather than several currents or voltages, it resulted with more information per cable, improved computing capability, and more cost-effective circuits. The Current-mode multiplier is designed to be based on radix 8. The multiplier is an electrical circuit that is used by computer devices like a processor to multiply two numbers. In total, multiplication is a simple arithmetic operation with 8.72 per cent of all instructions (Asadi, 07). Multiplication is also a complicated process of delay. Typical loop multiplication is between 2 and 8 cycles (Santoro, 89). As a result, it is to be said that having high-speed multipliers is crucial to the performance of the processors. Different methods can be used to apply a multiplier; one of these techniques is MVL. Multiple valued or manyvalued logic is a discrete p-valued system. Multiple valued logic consists of discrete p-valued systems in which p>2 is used, non-binary valued systems (Dubrova, 99). The number of discrete logic levels in the MVL is not limited to two. This is different from binary, with only two levels that are logic level 0 and logic level 1; however, discrete variables with an infinite number of values can be called MVL (Miller, 07). MVL has many useful applications and has been used for the design of digital devices which use more than two discrete signal stages, such as multiple-valued memories, multiple-valued arithmetic structures, fieldprogrammable gate arrays, etc. The key purpose of this work is therefore the development of the MVL multiplier. Previously, Hanyu and Kameyama had developed a 200 MHz 54x54-bit multiplier using multiple-value current-mode MOS circuits (Hanyu, 95). The efficiency of the multiplier is calculated to be approximately 1.4 times higher than that of the corresponding binary configuration under normalized energy dissipation. The special architecture of the current quaternary multiplier is indicated in (Chu, 95). Without bias generation circuitry, the device comprises 49 MOS transistors and simulates worst-case delay with 0.2m CMOS technology, producing output currents of around 10ms. Shimabukuro and Zukeran proposed Modulo 7 multipliers with a barrel shifter and a sign inverter consisting of 60 transistors. Simulation findings for 0.8m CMOS and 5V power supply technologies indicate a delay time of 4.32ns (Shimabukuro, 98). It has been stated from previous works that the quaternary number was the most used one, but in MVL, the higher radix would best reflect as many numbers as feasible. Digital arithmetic operations are of considerable significance in the creation of digital processors and application-specific devices. The arithmetic process in digital equipment is a significant component of circuits. Amongst arithmetic circuits, adders are mostly used as basic building blocks, e.g., processor cores of DSP. According to the logic level of 'and' 1, binary adders are simple to implement, however, they have their redundant disadvantages in terms of circuit complexity and chip size, which eventually increases the delay in the propagation of the circuits (Leela, 15). A much smaller chip area, the full-adder circuit displays faster, more complex behavior than its binary alternatives. The drawbacks of current mode multi-valued logic circuits are higher static dissipation and lower noise limits for high radix (Temel, 04). A high-speed redundant binary architecture is developed by a fast parallel CMOS multiplier. This design allows (one) to transform a pair of partial products in normal binary form to a specific redundant binary number without any additional circuit. Makino et al. have enhanced the redundant binary adder circuit in such a way that redundant binary partial items are added and the converter circuit which converts the last redundant binary number to the corresponding binary number is also optimized. The defined features of the converter circuit are carried out with multiplexer circuits primarily. A 54x54-bit multiplier is fully compatible with this architecture (Makino, 96). The width of the transmission gates can be expanded to speed up all modifications. Trade-offs in field, speed, and power must be determined in the scope of each specific application (Current, 95). Kawahito et al. have shown that in some examples, MVL circuits have an inherent advantage, like Wallace trees. The advantages to include with current mode circuits are very noticeable. (Kawahito, 87). Clarke and Nudd suggested different methods for constructing block libraries for general logic and arithmetic. These techniques will enable the design of any circuit at basin level in typical MVL mode (Clarke, 94). Sheng et al. suggests a new approach for ternary logic circuits, using carbon nanotube FETs. Ternary logic can be a viable solution to the traditional approach to binary logic design, since, in contemporary digital architecture, there are reduced overhead circuits such as interconnect and chip area that promote stability and cost-effectiveness. With regard to the power delay in the fully implemented design, the use of the suggested ternary gates combined with binary gates solutions has been reduced by more than 90% (Lin, 09). The efficiency of 3D graphics and signal processing systems relies heavily on the performance of multiplications, which means that these techniques must tackle highly multiplication-intensive operations. A great deal of work has also been performed on sophisticated multiplication algorithms and patterns (Booth, 51), (Dadda, 65), (Elguibaly, 00), (Fadavi, 93), (Itoh, 01), (Kang, 04), (Kang, 93), (Nagamatsu, 89), (Oklobdzija, 96), (Santoro, 89), (Stelling, 98), (Wallace, 64), (Weinberger, 81), (Yeh, 00). There are three main steps of each multiplication.
Chapter
Full-text available
Following preliminary research and analysis, an eight-story residential building was created using SAP2000. Several fire scenarios were simulated on different floors, including on exposed primary carrier steel elements, to generate strength-temperature findings. The strength of the steel elements was determined by analyzing their axial compressive force, moment, and shear forces under various conditions, including yield, breaking stresses, and changes in the modulus of elasticity. Additionally, the temperature parameters of the reinforcement mesh used in the concrete block at different depths were assessed by considering the variation in axial, moment, and shear forces if all steel elements were protected or not. The amount of displacement of the slab due to the high-temperature effect over time as also evaluated and compared to the allowable deflection. Based on these findings, the impact of fire on the performance of steel structures was analyzed in accordance with the ISO 834 standard adopted by European Standards. To develop a new mathematical model, the Eurocode design equations (CEN, 2005a, 2005b, 2005c), as well as the Visual Basic Applications (VBA) in Excel software, were employed.
Chapter
Multiplication is one of the widely used arithmetic operations in a variety of applications, such as image/video processing and machine learning. FPGA vendors provide high-performance multipliers in the form of Digital Signal Processing (DSP) blocks. These multipliers are not only limited in number and have fixed locations on FPGAs but can also create additional routing delays and may prove inefficient for smaller bit-width multiplications. Therefore, FPGA vendors additionally provide optimized soft IP cores for multiplication. However, in this work, we advocate that these soft multiplier IP cores for FPGAs still need better designs to provide high-performance and resource efficiency. Towards this end, this chapter presents various designs of resource-optimized and high-performance accurate unsigned/signed and constant multipliers for FPGA-based systems. Compared to the multiplier IPs provided by the FPGA synthesis tool, our designs offer better resource utilization, lower critical path delay, and better energy efficiency.
Chapter
To keep this work self-contained, this chapter provides the basics of multiplier structures as well as the state-of-the-art formal verification techniques proposed to verify them. Moreover, the theoretical background of SCA and its application in the verification of arithmetic circuits are explained, which is the main focus of this book.
Sign Correction in Modulus Convention, Cambridge Conference Report
  • T J Rey
  • R E Spencer
T. J. REY and R. E. SPENCER, Sign Correction in Modulus Convention, Cambridge Conference Report, 1950. at Cambridge University on October 10, 2014 http://qjmam.oxfordjournals.org/ Downloaded from