Using Electromagnetic Emanations for Variability Characterization in Flash-Based FPGAs

Jimmy Tarrillo, Jorge Tonfat, Fernanda Kastensmidt, Ricardo Reis
PPGC, PGMICRO - Instituto de Informática
Universidade Federal do Rio Grande do Sul – UFRGS
Porto Alegre, Brazil
{jtarriilo, jltseclen, fglima, reis}@inf.ufrgs.br

Abstract—An ElectroMagnetic analysis (EMA) technique is applied to Flash-based FPGA (Field Programmable Gate Arrays) ProASIC3E to measure the delay variability. Measurements show that delay variations can reach 40% according to the mapping, placement and routing used in the FPGA array, while the synthesis tool analysis show differences lower than 7%. Comparisons between the use of EMA technique in Flash and SRAM-based FPGAs are presented. The Flash-based FPGA configurable blocks and routing structures are modeled at the electrical level. Then, SPICE simulations are performed to compare the predictive variability to the measurements ones. Results obtained with EMA can support designers on selecting different parts of the FPGA array, such as distinct mapping, placements and routing wires according to application and provide a valuable feedback for the FPGA’s manufacture company.

Index Terms—Variability, Electromagnetic Analysis, FPGA, Flash.

I. INTRODUCTION

FPGAs are programmable circuits that can be customized by the user to implement specific designs. FPGAs manufactured with flash technology are attractive. Indeed, they present fast time-to-market and high-flexibility while being reprogrammable and nonvolatile. The main interest is that they hold their configuration content even without power supply. This may simplify and reduce board complexity and cost; the bitstream does not need to be reloaded into the FPGA at each power up cycle. Flash-based FPGAs are configured through a set of floating-gate (FG) switches that are a combination of floating gate transistors and pass transistors, which work as switches to configure the logic and routing connections.

Nanometer scaling of CMOS technology has lead to an increased process variability in circuits, reaching a point that can be considered as a major bottleneck to further scaling. There is a real need for process measurement and evaluation. Usually, in Application Specific Integrated Circuits (ASICs), devices are designed by analysing layout effects to characterize delay variability and leakage current using test-structure arrays. For FPGAs, targeting highest performance and lowest power dissipation, users need to know variability along the die to better select the design mapping and placement To measure this variability into FPGAs, we propose to develop some customized designs into the programmable array.

A non-invasive ElectroMagnetic Analysis (EMA) technique proposed in [1] has shown to be a promising method to characterize the variability in SRAM-based FPGAs. This method avoids adding extra logic inside the FPGA for measurements and data collection. No input or output blocks are needed to inject or collect data. Internal mapped and routing logic work at a specific frequency that is captured by an electromagnetic measurement system.

In this work, the EMA technique is applied to Flash-based FPGAs ProASIC3E from Actel/MicroSemi, to measure delay variability, and to compare to previous results obtained in SRAM-based FPGAs from Xilinx (Spartan3). If we compare FPGA structure of SRAM-based and Flash-based FPGAs, the main differences are: the configurable switches, the configurable logic blocks, and the routing architectures. First, each configurable logic block of ProASIC3E FPGAs (named VersaTile) can implement any 3-input logic function. This is functionally equivalent to a 3-inputs Look-Up Table (3-LUT) used in SRAM-based FPGAs. Nevertheless, it is important to highlight the difference between the two electrical implementations: VersaTile is composed of a set of combinational logic gates, while LUTs are mainly based on pass transistors and transmission gates. In addition, the routing architectures of SRAM and Flash-based FPGAs are also different as they are composed of different hierarchy of wire segments. And, finally, the switching elements change from SRAM cell to floating gate transistors for Flash-based FPGAs. Those cell types can present very different behavior regarding variability.

Variability analysis in SRAM-based FPGAs has been studied in [2,3,4]. Different configurations of ring oscillators (RO) are used as sensors to characterize delay variations in Altera Cyclone II and in Virtex 4 FPGAs (90nm technologies) in [2], Spartan 3E FPGAs (90nm technology) in [3] and Xilinx Virtex 5 LX (65nm technology) in [4]. To perform a circuit characterization, each single logic block of the FPGA is configured with a RO creating an array of sensors. It is always reported that a measurement subsystem (counters and/or

Florent Bruguier, Morgan Bourrée, Pascal Benoit, Lionel Torres
LIRMM
Université Montpellier 2 – UMR CNRS
Montpellier, France
{florent.bruguier, morgan.bourree, pascal.benoit, lionel.torres}@lirmm.fr
control logic) is implemented into the FPGA to extract the frequency from each oscillator.

A set of RO composed of different logic gates were mapped and placed in distinct parts of the flash-based FPGA matrix. The frequency operation of each oscillator was measured by using EMA technique. Then, results were compared to predict frequency provided by the logic and synthesis tool. The Flash-based FPGA VersaTiles and routing structures were also modeled at electrical description level. SPICE simulations were performed to compare the simulation results to the measurements. It is important to mention there are no similar works into the literature on measuring the variability in a non-intrusive manner for Flash-based FPGAs.

The goal of the paper is to answer the following questions:
1. Is the Electromagnetic Analysis (EMA) technique presented in [1] suitable for variability characterization in Flash-based FPGAs?
2. What are the suitable logic-configuration to characterize Flash-based FPGAs?
3. What are the differences observed in Flash-based compared to SRAM-based FPGAs?, and
4. the discrepancies between SPICE simulations and estimations from the synthesis tool?

The paper is organized as follows. The principles of the Electromagnetic Analysis Method are explained in Section II. Section III is devoted to the design of test-case circuits for ProASIC3E Flash-based FPGA. Variability evaluation by EMA method is presented afterwards in Section IV. Finally, conclusion with suggestions for future research is drawn in Section V.

II. ELECTROMAGNETIC EMANATION ANALYSIS (EMA)

In the literature, the variability of SRAM-based FPGAs can be measured with sensors along the array with a complementary subsystem for data processing / acquisition / communication [2, 3, 4] as depicted in Figure 1. This approach has been proved to be inadequate since it directly impacts measures of variability. As reported in [1], SRAM-based FPGAs were characterized twice with two different probe positions for the acquisition and communication subsystems. Two different cartographies were obtained and compared. The correlation between the two probe positions was lower than 75%. And for the same probe position, the difference between the two measured frequencies was around 5%. Based on the results, it was concluded that the surrounding configured logic has too much impact on the measure itself.

In this way, electromagnetic analysis can be used to directly determine the switch activity of integrated blocks [5]. As shown in [6], the magnetic flux \( \Phi(t) \) depends on the instantaneous current value \( I(t) \) in the power/ground network. As a result, to characterize process variations in FPGAs, the method proposed here is based on ElectroMagnetic Analysis (EMA).

The experimental protocol is divided into three main steps, where the sensor is successively placed at each location to characterize the whole reconfigurable array:

- The process variation is first captured with an asynchronous sensor, which emanates electromagnetic waves.
- These radiations are measured, amplified and collected by dedicated laboratory equipment (EM probe, low-noise amplifier, and an oscilloscope) (Figure 2).
- The signal is then processed to identify the RO frequency.

The method proposed in [1] is only based on a sensor to capture process variations, unlike other approaches requiring an internal measurement subsystem. For instance, a simple 3-inverter RO can be used. The emitted frequency of this asynchronous structure directly depends on the process capabilities.

In order to capture, measure, amplify and collect the electromagnetic emanations from the FPGA, a complete platform [7] has been deployed, allowing a fine control of the environment (temperature, core voltage), and a high-performance measurement system, shown in Figure 2. It is first composed of a high frequency near-field probe from Rohde & Schwarz, connected to a Low Noise 40dB Amplifier from MITEQ. The amplified signal is then transmitted to a 3.5GHz bandwidth oscilloscope from Lecroy. An XYZ table is used in order to place with accuracy the near-field probe and to reproduce the experiments. Once collected, the data are transmitted from the oscilloscope to an external computer. A signal processing is then performed with Matlab. A Hanning window is first applied to avoid the spectral leakage, and then a Fast Fourier Transform (FFT) is performed to convert data from time to frequency domain. Finally, an analysis of the power density spectrum is conducted to extract the frequency of the process sensor.

Note that this complete setup is fully automatized with scripts, allowing the control of the thermal chamber, the core voltage, the XYZ position, the bitstream uploading, the capture of electromagnetic waves and the signal processing. This procedure allows a customizable and flexible characterization of a given FPGA.

![Fig. 1. Variability characterization approach in [2,3,4].](image-url)
III. FLASH-BASED FPGAS DESIGN SETUP

In this paper, the EMA method is used to characterize the ProASIC3E Flash-based FPGA - A3PE1500-PQ208 fabricated in a 130-nm Flash-Based CMOS process. The part operates at 3.3V in the IO pins and 1.5V at the core. It is composed of an array of programmable logic tiles named VersaTiles surrounded by routing structures [8]. In order to define different configurations of the VersaTile and different routing resources, a set of logic designs were proposed for the electromagnetic experiment. We propose to investigate the impact of logic function mapping, placement and routing on the variability.

A. ProASIC3E Architecture

VersaTiles and routing resources are programmed by switching ON or OFF switches implemented as floating gate (FG) transistors (NMOS transistor with a stacked gate). The FG control circuit is a set of two NMOS transistors: 1) a sense transistor to program the floating gate and sense the current during the threshold voltage measurement and 2) a switch transistor to turn ON or OFF a data-path in the FPGA. The two transistors share the same control gate and floating gate. The threshold voltage is determined by the stored charge in the FG. Figure 3 illustrates VersaTile and a possible set of configurations that can be used to implement some common logic gates.

Each Versatile can implement any 3-input logic functions, which is functionally equivalent to a 3-input Lookup Table (3-LUT). But it is important to highlight that the electrical implementation of the VersaTile is totally different than the electrical implementation of a Lookup Table (LUT). Hence, the VersaTile may have a different electrical behavior to variability effects with respect to a 3-input LUT. The VersaTile can also implement a latch with clear and reset, or D flip-flop with clear or reset, or enable D flip-flop with clear and reset by using the logic gate transistors and feedback paths inside the VersaTile block. For each configuration in the VersaTile block, the number of FG switches and transistors in the critical path changes.

By using Libero tool provided by Microsemi, it is possible to automatically or manually perform the placement of each VersaTile. In this work, the placement was automatically configured by using the SDF file. The design placement and routing were graphically analyzed by the Libero/Designer/ChipPlanner tool. According to the placement, different routing resources are used, which implies in a different number of FG switches used in the routing and consequently variations in the propagation-delays.

B. Test-case Circuits

The test case circuit is a RO composed of three logic stages, as described in Figure 4. Three different ROs were used, one composed of inverters in two of the ring stages, one composed only by 2-input NAND gates, and another composed of 2-input NOR gates into the ring stages. In this way, the correlation between variability and the logic mapping in the VersaTile will be analyzed.

Designs were divided into 2 cases, shown in Figure 5 and 6, respectively. Case A represents a set of ROs (inverters, NAND2, NOR2) manually placed side by side vertically in the array with minimal distance connection between each VersaTile stage. Case B represents a set of ROs (inverters, NAND2, NOR2) manually placed side by side horizontally in the array with minimal distance connection between each VersaTile stages. In both cases, 50 different locations in the array were arbitrarily selected.
Fig. 4. Selected configurations of the ring-oscillators composed of inverters, NAND and NOR gates mapped into the VersaTiles of the ProASIC3E FPGA.

Fig. 5. Case A: Selected vertical placements of the ring-oscillators composed of inverters, NAND and NOR gates mapped into the VersaTiles of the ProASIC3E FPGA.

Fig. 6. Case B: Selected horizontal placements of the ring-oscillators composed of inverters, NAND and NOR gates mapped into the VersaTiles of the ProASIC3E FPGA.

IV. RESULTS

A. Experimental setup

To ensure reproducible results, the temperature and the voltage are kept constant at the nominal value during the whole process acquisitions using a thermal chamber (Figure 7). Regarding this signal processing, it is important to note that the amplitude of the RO line is directly linked to the probe position. However, using a differential algorithm between two position measurements, it is always possible to extract the information from the acquisition.

B. Comparison of several RO configuration in Flash-based FPGA

This method was successfully applied to two FPGA parts: named here 1105 and 1116. The two FPGA parts are identical, same size, input and output pins. The three ROs were successively placed and moved by reconfiguration at each location as described before (Figure 5, Figure 6). From there, we obtained two circuit cartographies as depicted in Figure 8. The cartographies show the intra and inter die variability. For the given FPGA parts (part number 1105 and 1116), configured with the same type of RO and same placement style, we observe very important local (intra-die) variations, up to 39.4% (120.6MHz). These significant variations are unusual for such a technology (130nm) and may be the consequence of the routing resources used in the different RO configurations and the flash-based switch structures. When comparing to SRAM-based FPGAs, results in [1] have shown intra-die variations up to 10% for 90nm technology (Spartan 3).

When comparing the results obtained from both FPGA parts for the same configuration (inter-die), the maximum frequency variation between two measurements is 24% (85MHz).

Results are summarized in Table I. We have successfully measured variations among the different configurations of the two FPGA parts. The ROs based on NAND and NOR configurations are faster than the ones based on inverters. It is possible to analyze that the variability varies according to the VersaTile mapping and routing. This is an important result because it shows that there is no single test circuit that must be used to measure the variability. It is necessary to have a set of test-circuits in order to have a range of variability that can be observed in a certain device. Especially in the case of Flash-based FPGAs, that each configuration of the versatile block uses different logic parts and different number of floating gate transistor switches in the logic path.
RO using inverters in the FPGA part number 1105

RO using inverters in the FPGA part number 1116

Fig. 8. Typical cartography for the ProASIC3E FPGA when ROs of inverters are mapped.

C. Comparison of EMA method and Synthesis Tool prediction

The Libero Synthesis Tool analyzed each RO design with a certain placement and routing to estimate the maximum frequency. Table I summarizes the results. Note that in this case, the inter-die variations are estimated to be less than 7% considering the real design used. The variations are much lower than the ones measured by EMA. Finally, for almost all the cases the mean frequency obtained is lower than the obtained by EMA. So, synthesis tool normally guaranty the worst-case performance of the circuit.

D. Comparison of EMA method and SPICE simulations

Electrical simulations were performed in order to predict and estimate the variability observed by the experiment. First, standard CMOS electrical models were used to describe in a SPICE netlist the programmable logic circuit (VersaTile and ultra-fast local resources). Second, the effects of variability represented by technologic transistor variations were added to the original device models. The goal was to qualitatively and quantitatively study the variations between different VersaTile configurations as observed in experimental results.

The VersaTile logic was described in SPICE using PTM 130nm technology [9] with 1.5V power supply voltage. The transistors were sized to obtain a similar propagation delay as published in the datasheet of the component and estimated by the Designer Tool from Actel/MicroSemi.

<table>
<thead>
<tr>
<th>Ring Placement</th>
<th>INV</th>
<th>NAND</th>
<th>NOR</th>
</tr>
</thead>
<tbody>
<tr>
<td>Board</td>
<td>1105</td>
<td>1116</td>
<td>1105</td>
</tr>
<tr>
<td>EMA analysis</td>
<td>Mean Freq (MHz)</td>
<td>279.7</td>
<td>267.4</td>
</tr>
<tr>
<td></td>
<td>Max. Variation (MHz)</td>
<td>69</td>
<td>35.5</td>
</tr>
<tr>
<td></td>
<td>Max. Variation (%)</td>
<td>24.7</td>
<td>13.3</td>
</tr>
<tr>
<td>Libero Tool</td>
<td>Mean Freq (MHz)</td>
<td>271.48</td>
<td>263.29</td>
</tr>
<tr>
<td></td>
<td>Max. Variation (MHz)</td>
<td>1.49</td>
<td>17.93</td>
</tr>
<tr>
<td></td>
<td>Max. Variation (%)</td>
<td>0.65</td>
<td>6.81</td>
</tr>
<tr>
<td>SPICE Simulation (VersaTile Config1)</td>
<td>Freq (MHz) Corner case ff</td>
<td>442.48</td>
<td>400.00</td>
</tr>
<tr>
<td></td>
<td>Freq (MHz) Corner case fs</td>
<td>412.42</td>
<td>352.52</td>
</tr>
<tr>
<td></td>
<td>Freq (MHz) Corner case sf</td>
<td>227.91</td>
<td>204.35</td>
</tr>
<tr>
<td></td>
<td>Freq (MHz) Corner case ss</td>
<td>273.97</td>
<td>234.68</td>
</tr>
<tr>
<td></td>
<td>Freq (MHz) typical case</td>
<td>354.61</td>
<td>313.26</td>
</tr>
<tr>
<td></td>
<td>Max. Variation (%)</td>
<td>60.51</td>
<td>62.45</td>
</tr>
<tr>
<td>SPICE Simulation (VersaTile Config2)</td>
<td>Freq (MHz) Corner case ff</td>
<td>588.24</td>
<td>390.78</td>
</tr>
<tr>
<td></td>
<td>Freq (MHz) Corner case fs</td>
<td>527.70</td>
<td>343.48</td>
</tr>
<tr>
<td></td>
<td>Freq (MHz) Corner case sf</td>
<td>330.78</td>
<td>192.57</td>
</tr>
<tr>
<td></td>
<td>Freq (MHz) Corner case ss</td>
<td>369.00</td>
<td>233.10</td>
</tr>
<tr>
<td></td>
<td>Freq (MHz) typical case</td>
<td>473.93</td>
<td>307.69</td>
</tr>
<tr>
<td></td>
<td>Max. Variation (%)</td>
<td>54.32</td>
<td>64.42</td>
</tr>
</tbody>
</table>
For instance, a VersaTile configured as a 2-input NAND gate from ProASIC3E macro library should have an average delay of 630 ps, when configured as an inverter gate, the delay is 540 ps and when configured as 2-input NOR gate, the delay is 650 ps. Each FG switch was implemented as a NMOS pass transistor and each multiplexer was implemented by using transmission gates. Figure 9 illustrates the actual ProASIC3E switch and the simplified switch described in SPICE. Transistors were sized with Wnmos=390nm and Wpmos=780nm.

There are many different ways to map the same logic function in a VersaTile. In each VersaTile block, there are 32 switches to configure (example of configurations of inverter gate, 2-input NAND gate and 2-input NOR gate are described in Figure 2). The number associated with the switch means that the switch is ON (1) or OFF (0). Two configurations are presented for each logic function: Config1 and Config2. For example, the function INV (see figure 2) can be implemented by configuring the VersaTile to receive the input A at the X1 input or it can be configured to receive the input A at the X3 input. At each configuration, different paths with a different amount of FG switches were selected.

The variability in 130 nm technologies was based in the corner cases scenarios, where the transistors Vth of NMOS and PMOS may vary up to 8%. The scenarios are: corner case fast fast (ff), fast slow (fs), slow fast (sf) and slow slow (ss). The set of ROs (INV, NAND2 and NOR2) composed of 3 VersaTiles each customized to implement the correspondent function by using config1 and config2 were simulated. Each VersaTile is connected to each other through one basic routing cell that models the ultra-fast local net, while the loop connection is composed of three basic routing cells. The goal is to compare the delay variability in different circuit mappings. Table I presents the results of the calculated frequencies from SPICE simulation in the four corners, the average and the maximum difference. Note that the variations observed at electrical simulations are in the same order observed by the EMA technique.

V. CONCLUSION AND PERSPECTIVES

The EMA technique was successfully applied to Flash-based FPGA ProASIC3E to characterize intra-die and inter-die variability. Three different RO structures were evaluated to build the FPGA cartographies, showing that the variations can reach up to 40% compared to the mean value, while the synthesis tool reported differences lower than 7%. In order to prove the consistency of the EMA results, SPICE simulations were run to compare to the predictive variability: the variations observed at electrical simulations are in the same order as the ones observed by the EMA technique. The EMA is a non-intrusive approach that can help designers on selecting efficiently different portions of the FPGA array, and provide a valuable feedback for the FPGA’s manufacture company. Future works include the use of EMA to analyze more precisely the degradation due to aging effects and radiation effects such as total ionizing dose.

REFERENCES