ChapterPDF Available

Cycle-Accurate Verification of the Cryptographic Co-Processor for the European Processor Initiative

Authors:

Abstract and Figures

This paper presents a cycle-accurate verification environment for the Crypto-Tile, a cryptographic accelerator integrated into the EPI General Purpose Processor. The focus of this work is to provide a robust methodology for validating the functionality and performance of the Crypto-Tile. The verification environment includes an in-depth examination of the internal architecture and operational aspects of the Crypto-Tile, allowing for accurate modelling of hardware components and emulation of Direct Memory Access (DMA) operations. Developers can leverage this environment to simulate and verify their C-Code implementations, utilizing the functions available in the Crypto-Tile library or creating custom libraries. The verification process involves using the 32-bit AXI4 interface for communication between the processor and the Crypto-Tile while emulating DMA operations to ensure accurate testing.
Content may be subject to copyright.
Cycle-Accurate Verification of the
Cryptographic Co-Processor for the European
Processor Initiative
Pietro Nannipieri1[0000000225385440] , Stefano Di
Matteo1[000000025711432X], Luca Crocetti1[0000000185048203] , Luca
Zulberti1[0000000195992652], Luca Fanucci1[0000000154264974], and Sergio
Saponara1[0000000167244219]
Deparment of Information Engineering, University of Pisa
Via G. Caruso 16, Pisa, Italy
pietro.nannipieri@unipi.it
Abstract. This paper presents a cycle-accurate verification environ-
ment for the Crypto-Tile, a cryptographic accelerator integrated into
the EPI General Purpose Processor. The focus of this work is to provide
a robust methodology for validating the functionality and performance
of the Crypto-Tile. The verification environment includes an in-depth
examination of the internal architecture and operational aspects of the
Crypto-Tile, allowing for accurate modelling of hardware components
and emulation of Direct Memory Access (DMA) operations. Developers
can leverage this environment to simulate and verify their C-Code imple-
mentations, utilizing the functions available in the Crypto-Tile library
or creating custom libraries. The verification process involves using the
32-bit AXI4 interface for communication between the processor and the
Crypto-Tile while emulating DMA operations to ensure accurate testing.
Keywords: AES, ECC, RNG, SHA, RISC-V, EPI, Cryptoprocessor,
Hardware, Verification, Cycle-accurate
1 Introduction
In today’s interconnected world, where digital information flows seamlessly across
networks, the importance of cryptography cannot be overstated. From securing
online transactions and safeguarding personal information to protecting national
security interests, cryptography plays a pivotal role in upholding the trust and
privacy of individuals, organizations, and governments. As the reliance on com-
puting systems continues to grow exponentially, so does the need for robust and
efficient cryptographic processors [1,2,5,7,12]. In this context, the European Pro-
cessor Initiative (EPI) [6] represents a groundbreaking collaborative effort among
European Union member states, research institutions and industry partners to
develop a cutting-edge high-performance computing ecosystem. The EPI aims
to develop a new generation of energy-efficient and high-performance processors.
2 P. Nannipieri et al.
With digital transformation permeating every aspect of society, achieving self-
reliance in processor technology has become imperative for Europe’s strategic
autonomy. By fostering homegrown expertise and innovation in processor design,
the EPI seeks to reduce dependence on non-European technology providers, en-
hance Europe’s technological competitiveness, and strengthen its position in the
global market. Developing a secure and efficient cryptographic processor is an in-
tegral part of the EPI’s broader vision, as it addresses the growing need for robust
encryption capabilities to safeguard sensitive data and critical information infras-
tructures against ever-evolving cyber threats. When dealing with cryptographic
algorithms, the computational complexity poses challenges for traditional soft-
ware execution on Central Processing Units (CPUs). Cryptographic operations
involve data manipulations that demand substantial processing power. The ex-
ecution of these algorithms purely through software implementations can result
in significant performance bottlenecks and increased execution times. To over-
come these limitations, hardware acceleration emerges as a compelling solution.
This approach improves the overall performance of cryptographic operations and
ensures the secure processing and protection of sensitive data, making it a pre-
ferred choice in scenarios where efficient and high-performance cryptography is
paramount. As a solution, the EPI system integrates the Crypto-Tile IP core,
which is a dedicated hardware component for cryptography. It plays a crucial
role in providing hardware acceleration for cryptographic algorithms, enabling
efficient and secure encryption and decryption operations, and robust protection
mechanisms for security-critical assets, such as keys. Cycle-accurate verification
plays a pivotal role in the development and validation of complex hardware de-
signs, such as the Crypto-Tile. Simulating the hardware at the cycle level enables
a more granular and comprehensive analysis of the design’s functionality, perfor-
mance, and compliance with desired specifications. By precisely modelling the
hardware behaviour, timing constraints, and interactions with software compo-
nents, it becomes possible to identify and rectify potential design flaws or bugs.
Additionally, cycle-accurate verification enables detailed measurement and anal-
ysis of power consumption and latency in software-accelerated hardware prim-
itives. This work aims to provide an overview of the cycle-accurate verification
environment developed to validate the Crypto-Tile co-processor. By presenting
the verification approach followed using a cycle-accurate verification environ-
ment, we strive to contribute to the broader understanding of the rigorous ver-
ification processes undertaken for critical hardware components within the EPI
framework.
2 The Crypto-Tile Within the European Processor
Initiative
The Crypto-Tile [11] is a cryptographic accelerator designed to provide a com-
prehensive and versatile range of cybersecurity services with advanced secu-
rity features. It incorporates various engines to support different cryptographic
functions. The Advanced Encryption Standard (AES) engine [10] supports both
Cycle-Accurate Verification of Crypto-Tile for the EPI system 3
AES-128 and AES-256 ciphers, offering a minimum security strength of 128 bits
in terms of both classical and post-quantum security. The Elliptic-Curve Cryp-
tography (ECC) engine [4] enables operations on elliptic curves with widths of
256 and 521 bits, providing symmetric-key equivalent security strength for clas-
sical security. The Secure Hash Algorithm (SHA) engine [8] generates digests
of at least 256 bits and 384 bits through SHA2 and SHA-3 functions, meeting
the minimum strength requirement of 128 bits. Additionally, the Crypto-Tile
features a True Random Number Generator (TRNG) engine [9] that meets the
security requirements for cryptographic applications. The internal architecture
Crypto-Tile
ECC
cryptoprocessor SHA
cryptoprocessor RNG
cryptoprocessor
Global Configuration (Cfg), Control (Ctrl) and Status regsisters
AES
cryptoprocessor
Cfg, Ctrl,
Status
FSM
AES engine
Cfg, Ctrl,
Status
FSM
Cfg, Ctrl,
Status
FSM
Cfg, Ctrl,
Status
FSM
Data
ECC engine SHA engine RNG engine
Data Data Data
32-bit AXI4 I/F
128-bit AXI4
I/F 128-bit AXI4
I/F 128-bit AXI4
I/F 128-bit AXI4
I/F
To/from Data bus (128-bit AXI4)
To/from Configuration bus (32-bit AXI4)
Key slots Key slots
Fig. 1. Crypto-Tile Blocks Scheme
of the Crypto-Tile (Figure 1) consists of several key components. These include
a 32-bit AXI4 interface, which serves as a Slave Memory-Mapped interface for
accessing the registers of the Crypto-Tile through the Configuration bus. The
Global Management Unit handles the global configuration, control, and status
of the Crypto-Tile. Each of the four independent crypto-processors is dedicated
to a specific class of cryptographic algorithms: AES [10], ECC [4], SHA [8],
and Random Number Generator (RNG) [9]. These cryptoprocessors function as
coprocessing units for the main processor they are connected to, such as the
Secure Micro-Controller Unit (MCU) or the Secure Element (SE, a compact
micro-controller that handles the first stage of the secure boot routine). Each
cryptoprocessor includes local registers for configuration, control, and status and
a Finite State Machine (FSM) for managing cryptographic operations. They also
feature engines for hardware acceleration of cryptographic algorithms and func-
tions. The AES and ECC cryptoprocessors incorporate local resources for key
storage and management. To facilitate high-bandwidth transfers, four indepen-
dent 128-bit AXI4 interfaces, one for each cryptoprocessor, provide access to the
4 P. Nannipieri et al.
data registers. Performances and complexity of the Cryptotile are reported in
Table 1.
Table 1. Implementation results for the CryptoTile
Tech Entity Max Freq Complexity
Crypto Tile 150 MHz 27507 CLB; 144892 CLB LUTs; 93503 CLB Reg; 64 DSPs
Xilinx RNG Engine 260 MHz 2294 CLB; 10154 CLB LUTs; 7122 CLB Reg; 0 DSPs
VU37P SHA Engine 190 MHz 3433 CLB; 10290 CLB LUTs; 10787 CLB Reg; 0 DSPs
FPGA ECC Engine 95 MHz 15151 CLB; 79219 CLB LUTs; 37626 CLB Reg; 64 DSPs
AES Engine 170 MHz 1253 CLB; 6036 CLB LUTs; 2460 CLB Reg; 0 DSPs
Crypto Tile 3.7 GHz 1325.16 kGE
7nm RNG Engine 4.325 GHz 127.16 kGE
Std-Cell SHA Engine 3.725 GHz 128.32 kGE
Technology ECC Engine 1.525 GHz 658.90 kGE
AES Engine 2.425 GHz 56.01 kGE
3 The Crypto-Tile RISC-V Environment
run_sim.sh
(Receipe)
run_sim.sh
(Receipe)
Receipe.mkReceipe.mk
C
Project
C
Project
SV
Testbench
SV
Testbench
tc_crypto_tile_XXX.sv
Tools (QuestaSIM, GCC)
Clk & Rst Stimuli SDMA Stimuli
crypto_tile_SoC_tb.svh
128bit DMA AXI4 Master Emulator
crypto_tile_SoC.sv
32bit AXI4 interconnect
CVA6
RISC-V
Crypto
Tile
Init
RAM
Files to bemodief by the user
Output files
Simulated
Hardware
Simulation
Environment
Output
terminal
Output
terminal
uart.loguart.log
simulate-batch-
vsim.log
simulate-batch-
vsim.log
UART
Data Logging
Fig. 2. Crypto-Tile Cycle-Accurate Verification Environment
The verification environment set-up is based on the one presented in [13],
which exploits an automated framework for accelerating the design space ex-
ploration of hardware/software co-designs in heterogeneous digital systems. It
simplifies the customization of design tools, improves designer productivity, and
allows the evaluation of different hardware/software choices. The framework in-
tegrates the RISC-V toolchain and focuses on post-synthesis analyses for ac-
curate power consumption evaluations. It helps design robust systems against
Side-Channel Attacks [3], ultra-low-power and heterogeneous architectures, and
space-grade Systems on Chip (SoCs) with varying technology outcomes. The
environment architecture (Figure 2) relies on a makefile approach that requires
three different inputs: 1) a C project (i.e. the software intended to be executed
by the RISC-V processor); 2) a SystemVerilog (SV) testbench (to provide the
Cycle-Accurate Verification of Crypto-Tile for the EPI system 5
necessary inputs to the hardware and simulate its interface with the rest of the
world); 3) a Recipe, to configure the work environment. These inputs are fed
directly to the framework which processes them by compiling the C code and
initializing a simulated RAM containing the compiled executable, then the SV
testbench is executed to initialise the cycle-accurate verification. The testbench
needs to provide service signals (e.g. clock and reset), and it may send data
to the Crypto-Tile via the dedicated 128-bit DMA AXI4 emulator, achieving
a data rate much higher than the one provided by the 32-bit AXI4 Internal
interconnect. The test environment is then respfor collecting all the generated
information, writing a log file of the simulation, and printing the UART output
of the RISC-V processor into a dedicated file. To use the verification environment
user shall perform the following steps:
1) Write the C code to be executed, by exploiting the provided templates
and referring to the Crypto-Tile documentation for details on drivers, register
addresses, operation modes et al. to use the functions available in the Crypto-
Tile library or by creating an own library. It is noted that the C code provides
access only to the 32-bit AXI4 MCU interface on the processor. The DMAs are
emulated in System Verilog and cannot be accessed via the C code. However, this
limitation will be addressed with the Field Programmable Gate Array (FPGA)
system. All the C source files must be added as new targets in the makefile,
indicating the main C file (<c test>.c) as the main target (<c test>).
2) Write the SV testbench, which must include at least the clock (clk ) and
reset (rst) signals, Also DMA operations can be included by exploiting the Sys-
tem Verilog Secure DMA (SDMA) functions for the AXI4 Master emulation. In
this no, no default synchronization mechanisms between the RISC-V processor
and the SDMA interfaces are provided, hence they can be used in the Crypto-
Tile’s internal signals, such as Interrupt Request (IRQ) signals. The name of
the SV testbench file (e.g., <tc crypto tile ID>.sv) constitutes the top-level and
must be specified in the recipe (next step).
3) Run the simulation using the corresponding command. The logs of the
compilation process and the simulation are made available in a dedicated folder
(build), and they can be used to check errors and information.
4 Simulations
Thanks to the simulation environment provided, we were able to establish a
comprehensive test plan that stimulates the entire system by focusing on each
developed accelerator and service system. Below, we provide a concise overview
of the tests performed in each test category:
AXI Interfaces: This test category consists of 20 diverse tests that aim to
thoroughly test the AXI communication infrastructure. The focus is mainly
on the MCU AXI interconnect and each of the AXI DMA (one for each Cryp-
toprocessor). All supported AXI operations undergo comprehensive testing.
Global Management Unit: This test category comprises 58 distinct tests
that aim to execute write and read attempts on the register file. The Crypto-
Tile security specification governs the register file. These tests highlight pos-
6 P. Nannipieri et al.
sible error situations and evaluate the security of the system against unau-
thorized attempts to access protected data (both read and write).
AES Cryptoprocessor: This category of tests focuses on the AES cryp-
toprocessor and includes 138 different tests that need to be executed. These
tests cover all the different AES operative modes such as ECB, OFB, CFB,
CTR, XTS, CMAC, GCM, and CCM. Additionally, they involve the writ-
ing and reading of the cryptographically secured configuration and status
register.
ECC Cryptoprocessor: This test category focuses on the ECC crypto-
graphic processor with 62 different tests to be executed. The writing and
reading of secured configuration and status registers are tested intensively,
together with all the supported ECC engine operational modes.
SHA Cryptoprocessor: This category of tests is centred around the SHA
cryptographic processor and includes 63 different tests that need to be per-
formed. There is an extensive focus on testing the secure configuration and
status registers’ reading and writing, along with both SHA3 and SHA-2 op-
erational modes, each with all the supported key sizes.
RNG Cryptoprocessor: The final category of tests is centred around the
RNG cryptoprocessor and includes 45 different tests that need to be per-
formed. These tests heavily evaluate the writing and reading of secured con-
figuration and status registers, as well as the various operational modes, such
as changing the entropy source and number generation modes.
5 Conclusions
This work presented a comprehensive cycle-accurate verification environment
for the Crypto-Tile, a state-of-the-art cryptographic accelerator integrated into
the EPI system. The verification environment provides a systematic approach
for validating functionality and performance, by accurately modelling the be-
haviour of the hardware components and emulating the DMA operations in Sys-
tem Verilog. Developers can leverage this verification environment to simulate
and verify their C-Code implementations in conjunction with the Crypto-Tile.
The AXI4 MCU interface is the communication channel between the processor
and the Crypto-Tile, while DMA operations are emulated to ensure accurate
testing. Simulations can be performed by executing the designated command
within the verification environment. The generated simulation logs, accessible in
the specified build folder, facilitate result analysis and troubleshooting in case
of compilation or simulation errors.
Acknowledgemnt
This work was partially funded by the European Union’s Horizon 2020 research
and innovation programme “European Processor Initiative” (grant agreement
No. 101036168, EPI SGA2) and partly supported by the Italian Ministry of
University and Research (MUR) with the project CN4 - CN00000023 of Recovery
Cycle-Accurate Verification of Crypto-Tile for the EPI system 7
and Resilience Plan (PNRR) program, grant agreement No. I53C22000720001,
and in the framework of the FoReLab project (Departments of Excellence).
References
1. Intel Software Guard Extensions (Intel SGX) Key Management on the 3rd Gen-
eration Intel Xeon Scalable Processor. Tech. rep., Intel (August 2019)
2. Coppolino, L., D’Antonio, S., Mazzeo, G., Romano, L.: A comprehensive survey
of hardware-assisted security: From the edge to the cloud. Internet of Things 6,
100055 (2019)
3. Crocetti, L., Baldanzi, L., Bertolucci, M., Sarti, L., Carnevale, B., Fanucci, L.:
A simulated approach to evaluate side-channel attack countermeasures for the
Advanced Encryption Standard. Integration 68, 80–86 (September 2019)
4. Di Matteo, S., Baldanzi, L., Crocetti, L., Nannipieri, P., Fanucci, L., Saponara, S.:
Secure Elliptic Curve Crypto-Processor for Real-Time IoT Applications. Energies
14(15) (2021)
5. Gupta, S.: An edge-computing based Industrial Gateway for Industry 4.0 using
ARM TrustZone technology. Journal of Industrial Information Integration 33,
100441 (2023)
6. Kovc, M., et. Al: European Processor Initiative: Europe’s Approach to Exascale
Computing (2022)
7. McKeen, F., Alexandrovich, I., Berenzon, A., Rozas, C.V., Shafi, H., Shanbhogue,
V., Savagaonkar, U.R.: Innovative Instructions and Software Model for Isolated
Execution. vol. 10 (June 2013)
8. Nannipieri, P., Bertolucci, M., Baldanzi, L., Crocetti, L., Di Matteo, S., Falaschi,
F., Fanucci, L., Saponara, S.: SHA2 and SHA-3 accelerator design in a 7 nm tech-
nology within the European Processor Initiative. Microprocessors and Microsys-
tems 87 (2021)
9. Nannipieri, P., Di Matteo, S., Baldanzi, L., Crocetti, L., Belli, J., Fanucci, L.,
Saponara, S.: True Random Number Generator Based on Fibonacci-Galois Ring
Oscillators for FPGA. Applied Sciences (Switzerland) 11(8) (2021)
10. Nannipieri, P., Matteo, S., Baldanzi, L., Crocetti, L., Zulberti, L., Saponara,
S., Fanucci, L.: VLSI Design of Advanced-Features AES Cryptoprocessor in the
Framework of the European Processor Initiative. IEEE Transactions on Very Large
Scale Integration (VLSI) Systems 30(2), 177–186 (2022)
11. Nannipieri, P., Crocetti, L., Matteo, S.D., Fanucci, L., Saponara, S.: Hardware
Design of an Advanced-Feature Cryptographic Tile within the European Processor
Initiative. IEEE Transactions on Computers pp. 1–14 (2023)
12. Pinto, S., Santos, N.: Demystifying Arm TrustZone: A Comprehensive Survey.
ACM computing surveys (CSUR) 51(6), 1–36 (2019)
13. Zulberti, L., Di Matteo, S., Nannipieri, P., Saponara, S., Fanucci, L.: A Script-
Based Cycle-True Verification Framework to Speed-Up Hardware and Software
Co-Design: Performance Evaluation on ECC Accelerator Use-Case. Electronics
(Switzerland) 11(22) (2022)
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
This work describes the hardware implementation of a cryptographic accelerators suite, named Crypto-Tile, in the framework of the European Processor Initiative (EPI) project. The EPI project traced the roadmap to develop the first family of low-power processors with the design fully made in Europe, for Big Data, supercomputers and automotive. Each of the coprocessors of Crypto-Tile is dedicated to a specific family of cryptographic algorithms, offering functions for symmetric and public-key cryptography, computation of digests, generation of random numbers, and Post-Quantum cryptography. The performances of each coprocessor outperform other available solutions, offering innovative hardware-native services, such as key management, clock randomisation and access privilege mechanisms. The system has been synthesised on a 7 nm standard-cell technology, being the first Cryptoprocessor to be characterised in such an advanced silicon technology. The post-synthesis netlist has been employed to assess the resistance of Crypto-Tile to power analysis side-channel attacks. Finally, a demoboard has been implemented, integrating a RISC-V softcore processor and the Crypto-Tile module, and drivers for hardware abstraction layer, bare-metal applications and drivers for Linux kernel in C language have been developed. Finally, we exploited them to compare in terms of execution speed the hardware-accelerated algorithms against software-only solutions.
Article
Full-text available
Secure and efficient communication to establish a seamless nexus between the five levels of a typical automation pyramid is paramount to Industry 4.0. Specific ally, vertical and horizontal integration of these levels is an overarching requirement to accelerate productivity and improve operational activities. Vertical integration can improve visibility, flexibility, and productivity by connecting systems and applications. Horizontal integration can provide better collaboration and adaptability by connecting internal production facilities, multi-site operations, and third-party partners in a supply chain. In this paper, we propose an Edge computing based Industrial Gateway for interfacing information technology and operational technology that can enable Industry 4.0 vertical and horizontal integration. Subsequently, we design and develop a working prototype to demonstrate a remote production-line maintenance use case with a strong focus on security aspects and the edge paradigm to bring computational resources and data storage closer to data sources.
Article
Full-text available
Digital designs complexity has exponentially increased in the last decades. Heterogeneous Systems-on-Chip integrate many different hardware components which require a reliable and scalable verification environment. The effort to set up such environments has increased as well and plays a significant role in digital design projects, taking more than 50% of the total project time. Several solutions have been developed with the goal of automating this task, integrating various steps of the Very Large Scale Integration design flow, but without addressing the exploration of the design space on both the software and hardware sides. Early in the co-design phase, designers break down the system into hardware and software parts taking into account different choices to explore the design space. This work describes the use of a framework for automating the verification of such choices, considering both hardware and software development flows. The framework automates compilation of software, cycle-true simulations and analyses on synthesised netlists. It accelerates the design space exploration exploiting the GNU Make tool, and we focus on ensuring consistency of results and providing a mechanism to obtain reproducibility of the design flow. In design teams, the last feature increases cooperation and knowledge sharing from single expert to the whole team. Using flow recipes, designers can configure various third-party tools integrated into the modular structure of the framework, and make workflow execution customisable. We demonstrate how the developed framework can be used to speed up the setup of the evaluation flow of an Elliptic-Curve-Cryptography accelerator, performing post-synthesis analyses. The framework can be easily configured taking approximately 30 min, instead of few days, to build up an environment to assess the accelerator performance and its resistance to simple power analysis side-channel attacks.
Article
Full-text available
This article presents a cryptographic hardware (HW) accelerator supporting multiple advanced encryption standard (AES)-based block cipher modes, including the more advanced cipher-based MAC (CMAC), counter with CBC-MAC (CCM), Galois counter mode (GCM), and XOR-encrypt-XOR-based tweaked-codebook mode with ciphertext stealing (XTS) modes. The proposed design implements advanced and innovative features in HW, such as AES key secure management, on-chip clock randomization, and access privilege mechanisms. The system has been tested in a RISC-V-based system-on-chip (SoC), specifically designed for this purpose, on an Ultrascale + Xilinx FPGA, analyzing resource and power consumption, together with system performances. The cryptoprocessor has been then synthesized on a 7-nm CMOS standard-cells technology; performances, complexity, and power consumption information are analyzed and compared with the state of the art. The proposed cryptoprocessor is ready to be embedded within the innovative European Processor Initiative (EPI) chip.
Article
Full-text available
Cybersecurity is a critical issue for Real-Time IoT applications since high performance and low latencies are required, along with security requirements to protect the large number of attack surfaces to which IoT devices are exposed. Elliptic Curve Cryptography (ECC) is largely adopted in an IoT context to provide security services such as key-exchange and digital signature. For Real-Time IoT applications, hardware acceleration for ECC-based algorithms can be mandatory to meet low-latency and low-power/energy requirements. In this paper, we propose a fast and configurable hardware accelerator for NIST P-256/-521 elliptic curves, developed in the context of the European Processor Initiative. The proposed architecture supports the most used cryptography schemes based on ECC such as Elliptic Curve Digital Signature Algorithm (ECDSA), Elliptic Curve Integrated Encryption Scheme (ECIES), Elliptic Curve Diffie-Hellman (ECDH) and Elliptic Curve Menezes-Qu-Vanstone (ECMQV). A modified version of Double-And-Add-Always algorithm for Point Multiplication has been proposed, which allows the execution of Point Addition and Doubling operations concurrently and implements countermeasures against power and timing attacks. A simulated approach to extract power traces has been used to assess the effectiveness of the proposed algorithm compared to classical algorithms for Point Multiplication. A constant-time version of the Shamir’s Trick has been adopted to speed-up the Double-Point Multiplication and modular inversion is executed using Fermat’s Little Theorem, reusing the internal modular multipliers. The accelerator has been verified on a Xilinx ZCU106 development board and synthesized on both 45 nm and 7 nm Standard-Cell technologies.
Article
Full-text available
Random numbers are widely employed in cryptography and security applications. If the generation process is weak, the whole chain of security can be compromised: these weaknesses could be exploited by an attacker to retrieve the information, breaking even the most robust implementation of a cipher. Due to their intrinsic close relationship with analogue parameters of the circuit, True Random Number Generators are usually tailored on specific silicon technology and are not easily scalable on programmable hardware, without affecting their entropy. On the other hand, programmable hardware and programmable System on Chip are gaining large adoption rate, also in security critical application, where high quality random number generation is mandatory. The work presented herein describes the design and the validation of a digital True Random Number Generator for cryptographically secure applications on Field Programmable Gate Array. After a preliminary study of literature and standards specifying requirements for random number generation, the design flow is illustrated, from specifications definition to the synthesis phase. Several solutions have been studied to assess their performances on a Field Programmable Gate Array device, with the aim to select the highest performance architecture. The proposed designs have been tested and validated, employing official test suites released by NIST standardization body, assessing the independence from the place and route and the randomness degree of the generated output. An architecture derived from the Fibonacci-Galois Ring Oscillator has been selected and synthesized on Intel Stratix IV, supporting throughput up to 400 Mbps. The achieved entropy in the best configuration is greater than 0.995.
Article
Full-text available
This paper proposes the architecture of the hash accelerator, developed in the framework of the European Processor Initiative. The proposed circuit supports all the SHA2 and SHA-3 operative modes and is to be one of the hardware cryptographic accelerators within the crypto-tile of the European Processor Initiative. The accelerator has been verified on a Stratix IV FPGA and then synthesised on the Artisan 7 nanometres TSMC silicon technology, obtaining throughputs higher than 50 Gbps for the SHA2 and 230 Gbps for the SHA-3, with complexity ranging from 15 to about 30 kGE and estimated power dissipation of about 13 (SHA2) to 26 (SHA-3) mW (supply voltage 0.75 V). The proposed design demonstrates absolute performances beyond the state-of-the-art and efficiency aligned with it. One of the main contributions is that this is the first SHA-2 SHA-3 accelerator synthesised on such advanced technology.
Article
Full-text available
Sensitive data processing occurs more and more on machines or devices out of users control. In the Internet of Things world, for example, the security of data could be posed at risk regardless the adopted deployment is oriented on Cloud or Edge Computing. In these systems different categories of attacks — such as physical bus sniffing, cold boot, cache side-channel, buffer overflow, code-reuse, or Iago — can be realized. Software-based countermeasures have been proposed. However, the severity and complexity of these attacks require a level of security that only the hardware support can ensure. In the last years, major companies released a number of architectural extensions aiming at provide hardware-assisted security to software. In this paper, we realize a comprehensive survey of HW-assisted technological solutions produced by vendors like Intel, AMD, and ARM for both embedded edge-devices and hosting machines such as cloud servers. The different approaches are classified based on the type of attacks prevented and the enforced techniques. An analysis on their mechanisms, issues, and market adoption is provided to support investigations of researchers approaching to this field of systems security.
Article
Modern networks have critical security needs and a suitable level of protection and performance is usually achieved with the use of dedicated hardware cryptographic cores. Although the Advanced Encryption Standard (AES) is considered the best approach when symmetric cryptography is required, one of its main weaknesses lies in its measurable power consumption. Side-Channel Attacks (SCAs) use this emitted power to analyse and revert the mathematical steps and extract the encryption key. Nowadays they exist several dedicated equipments and workstations for SCA weaknesses analysis and the evaluation of the related countermeasures, but they can present significant drawbacks as an high cost for the instrumentation or, in case of cheaper instrumentation, the need to underclock the physical circuit implementing the AES cipher, in order to adapt the circuit clock frequency accordingly to the power sampling rate of ADCs or oscilloscopes bandwidth. In this work we proposed a methodology for Correlation and Differential Power Analysis against hardware implementations of an AES core, relying only on a simulative approach. Our solution extracts simulated power traces from a gate-level netlist and then elaborates them using mathematical-statistical procedures. The main advantage of our solution is that it allows to emulate a real attack scenario based on emitted power analysis, without requiring any additional physical circuit or dedicated equipment for power samples acquisition, neither modifying the working conditions of the target application context (such as the circuit clock frequency). Thus our approach can be used to validate and benchmark any SCA countermeasure during an early step of the design, thereby shortening and helping the designers to find the best solution during a preliminary phase and potentially without additional costs.