ArticlePDF Available

Abstract and Figures

Compact and energy-efficient computing avenues such as in-memory computing and processing-in-memory (PIM) are being actively explored to address the limitations of the sparse von-Neumann computing systems. The recent advancements in the field of emerging non-volatile memories (e-NVMs), such as FeFETs, RRAMs, MRAMs, etc., have propelled the development of the PIM technique where the logic operations are performed in situ (where the operands are stored) to reduce the energy draining data movement. Considering the promising potential of the doped-hafnium oxide (HfO2) based FeFETs, such as CMOS compatibility, high scalability, high integration density, and field-driven programming capability, in this work, for the first time, we propose a novel input-to-voltage mapping scheme and exploit drain-erase phenomenon to realize compact and energy-efficient majority logic gate using a single Fe-FDSOI FET, XOR and XNOR logic gates using two Fe-FDSOI FETs. Furthermore, utilizing the proposed FeFET-based XOR and XNOR logic design, we demonstrate a compact implementation of a half adder (using 3 FeFETs) and full adder (utilizing only 9 FETs) which outperforms the CMOS and prior eNVM-based implementations in terms of area and energy. Moreover, we also propose a modified XNOR-cell design utilizing 4 FeFETs for performing bitwise count operations in binary neural networks.
Content may be subject to copyright.
JOURNAL OF L
A
T
E
X CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 1
Compact XOR/XNOR-based Adders and BNNs
Utilizing Drain-Erase Scheme in Ferroelectric FETs
Musaib Rafiq, Graduate Student Member, IEEE,Yogesh Singh Chauhan, Fellow, IEEE and Shubham Sahay,
Senior Member, IEEE
Abstract—Compact and energy-efficient computing avenues
such as in-memory computing and processing-in-memory (PIM)
are being actively explored to address the limitations of the sparse
von-Neumann computing systems. The recent advancements in
the field of emerging non-volatile memories (e-NVMs), such as
FeFETs, RRAMs, MRAMs, etc., have propelled the development
of the PIM technique where the logic operations are performed
in situ (where the operands are stored) to reduce the energy
draining data movement. Considering the promising potential
of the doped-hafnium oxide (Hf O2)based FeFETs, such as
CMOS compatibility, high scalability, high integration density,
and field-driven programming capability, in this work, for the
first time, we propose a novel input-to-voltage mapping scheme
and exploit drain-erase phenomenon to realize compact and
energy-efficient majority logic gate using a single Fe-FDSOI
FET, XOR and XNOR logic gates using two Fe-FDSOI FETs.
Furthermore, utilizing the proposed FeFET-based XOR and
XNOR logic design, we demonstrate a compact implementation
of a half adder (using 3 FeFETs) and full adder (utilizing only
9 FETs) which outperforms the CMOS and prior eNVM-based
implementations in terms of area and energy. Moreover, we also
propose a modified XNOR-cell design utilizing 4 FeFETs for
performing bitwise count operations in binary neural networks.
Index Terms—Drain-Erase, Ferroelectric FETs, full adder,
processing-in-memory, XOR logic.
I. INTRODUCTION
THE unprecedented growth in the field of processing-in-
memory (PIM) architectures is attributed to the rapid
advancements in the emerging non-volatile memories (e-
NVMs) such as resistive random-access memory (RRAM),
phase change memory (PCM), ferroelectric (Fe)FETs, etc. [1],
[2]. Recently, CMOS-compatible doped-hafnium oxide-based
FeFETs have garnered significant attention for application in
memory and non-von-Neumann computing platforms owing
to their high program/erase speeds, high scalability, large
integration density, and multi-bit capability [3]–[5].
Considering the potential benefits of the FeFETs for PIM
implementation such as high scalability, high endurance, run-
time reconfigurability, low program/erase energy etc., in this
work, for the first time, we propose a novel input-to-voltage
mapping scheme (input assignments) for compact and energy-
efficient implementation of XOR, XNOR and 3-input ma-
jority logic gates by exploiting the drain erase scheme in
Fe-FDSOI FETs. The implementation of a 3-input majority
gate was initially reported in [7]. Furthermore, the proposed
FeFET-based XOR and XNOR logic gates serve as efficient
building blocks for highly compact half-adder and full adder
implementations, which require the least number of transistors
This work was partially supported by the Prime Minister’s Research Fel-
lowship (PMRF), Semiconductor Research Corporation (SRP Task 3056.001)
and Swarnajayanti fellowship (DST/SJF/ETA/02/17-18).
M. Rafiq, Y. S. Chauhan and S. Sahay are with the Department of Electrical
Engineering, Indian Institute of Technology Kanpur, Kanpur 208016, India.
(email: musaib20@iitk.ac.in, chauhan@iitk.ac.in, ssahay@iitk.ac.in).
Front Gate
STI STI
STI
Channel
IL
Metal
Metal
Ferroelectric Layer
Buried Oxide
Back Plane
Source
Drain Back Gate
(a) (b)
Fig. 1. (a) Schematic of the Fe-FDSOI FET having W = 170 µm and L
= 24 µm. A 10 nm thick doped Hf O2layer is present in the front-gate
stack. (b) Polarization-voltage characteristics of FE-CAP model calibrated to
the experimental results [6]
when compared to the conventional CMOS-design and the
prior implementations [8]. Moreover, we also propose a novel
4-FeFET based XNOR cell to perform the computationally
intensive multiply and accumulate (MAC) operations in binary
neural networks using bitwise counting (XNOR+POPcount)
while eliminating the additional weight fetching step required
by the prior XNOR-BNNs.
II. FE-FDSOI SIMULATION METHODOLOGY
The structure of a Fe-FDSOI FET utilizing doped-hafnium
oxide as the ferroelectric layer in the gate stack is shown
in Fig. 1(a). For emulating the Fe-FDSOI FET, we have
utilized a multi-domain Preisach model [9] for the FE ca-
pacitor connected to the experimentally calibrated industry
standard BSIM IMG model [10]. The model parameters of the
FE-Cap model are fine-tuned to reproduce the experimental
characteristics [6] (Fig. 1(b)). While the input voltage ramp
rate for the experimental data [6] is 24 mV/µs, we have used
a ramp rate of 6 mV/µs for the model.
Moreover, it was shown recently that the drain terminal can
also be used to alter the polarization state of the Fe layer in the
gate stack, enabling the vertical stacking of FeFET cells in a
3D NAND architecture [11]. Fig. 2 illustrates this drain-erase
operation: the FeFET is first initialized to a low VT H state by
applying a large positive voltage pulse at the gate terminal,
as shown in Fig. 2(a). After initialization, voltage pulses of
different amplitudes are applied to the drain terminal, with the
source terminal kept at 1.5 V to enhance the impact of drain-
erase [11]. Fig. 2(b) shows transfer (read) characteristics of
the FeFET corresponding to the different drain-erase voltage
pulses. From Fig. 2(b), it can be observed that the drain current
of the FeFET reduces as the amplitude of the drain-erase pulse
is increased.
III. IMP LEMEN TATIO N OF T H RE E-INPUT MAJORITY GATE
For the implementation of the three-input majority gate, we
have utilized the ability of Fe-FDSOI FETs to program/erase
(PRG/ERS) via the application of appropriate voltage pulses at
This article has been accepted for publication in IEEE Journal of the Electron Devices Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JEDS.2024.3497147
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
JOURNAL OF L
A
T
E
X CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 2
Fig. 2. Drain erase scheme: (a) voltage waveforms for drain erase scheme.
(b) Read characteristics for different drain erase pulses
Input A
Input B
Input C
Output
0
0
0
0
0
0
1
0
0
1
0
0
0
1
1
1
1
0
0
0
1
0
1
1
1
1
0
1
1
1
1
1
2-Input (B & C) AND
Gate for Input A=0
2-Input (B & C) OR
Gate for Input A=1
Gate
Source
Drain
Output
Input A Input B
Input C
Output
Phase I: Applying Inputs A,
B and C
Phase II: Reading
Output
1 µs
1 µs
1.5 V for 1 µs
0.1 V for 1 µs
0.05 V for 1 µs
Time
(b)
(a)
Fig. 3. (a) Truth Table of a 3-Input majority gate.(b) Voltage waveforms for
the proposed 3-Input majority gate (here Input case A = 1, B = 1, and C = 0
is shown).
the gate and drain terminals. The truth table of the three-input
(A, B, and C) majority gate is shown in Fig. 3(a). Looking at
the truth table, we find that when Input A is at logic level ‘0’,
the majority gate function can be realized as an AND operation
between inputs B and C, and when input A is at logic level
‘1’ the function can be realized as an OR operation between
inputs B and C. This can be written in an equation form as
MAJ (A, B , C) = (AND(B , C)InputA = 0
OR(B, C)InputA = 1 (1)
For the implementation of Equation 1, we have proposed
a novel input-to-voltage mapping scheme and exploited the
drain-erase phenomenon. First, input A needs to be encoded
appropriately to ensure that the Fe-FDSOI FET gets tuned to
AND-gate mode when A = 0 or OR-gate mode when A =
1. To realize this functionality, input A is encoded as logic
level ‘0’ by applying a -5 V pulse of 1 µs duration at the gate
terminal, whereas it is encoded as logic level ‘1’ by applying
a 5 V pulse of 1 µs duration at the gate terminal. On the other
hand, input B is encoded as logic level ‘0’ by applying a 0
V pulse for 1 µs at the gate terminal and logic level ‘1’ by
applying a 5 V pulse of 1 µs duration at the gate terminal of
the Fe-FDSOI FET. Furthermore, input C is encoded as logic
level ‘0’ by applying a 5 V pulse at the drain terminal and
is encoded as logic level ‘1’ by applying a 0 V pulse for 1
µs duration (negative logic). At first, input A is applied; after
that, inputs B and C are simultaneously applied at the gate
and drain terminal, respectively.
During the application of inputs B and C, the source
terminal of the Fe-FDSOI FET is kept at 1.5 V to increase
the effectiveness of the drain-erase scheme [11]. After the ap-
plication of inputs, the state of the Fe-FDSOI FET represents
the output of the majority gate. The output is encoded as the
current flowing through the Fe-FDSOI FET. The output state
is read by applying a read gate voltage of 0.1 V and a drain
voltage of 0.05 V. The high drain current flowing through
the Fe-FDSOI FET represents logic level ‘1’, and the low
drain current represents logic level ‘0’. The operation can be
understood as input A = 0 (-5 V initial pulse) erases the Fe-
FDSOI FET inducing a high VT H (low-current) state, and only
for the case B = 1 (5 V pulse) and C = 1(0 V pulse), the Fe-
FDSOI FET gets programmed and enters into a low VT H state
(high current, output = 1). Moreover, A = 1 (5 V pulse) also
leads to the programming of the device in a low VTH state
(high current), and only for the case B = 0 (0 V pulse) and C
= 0 (5 V pulse) the drain-erase scheme kicks in and the device
switches to a high VT H state (low current, output = 0). Thus,
the device performs an AND operation between inputs B and
C when A= 0 and performs an OR operation between inputs
B and C when A = 1.
Fig. 3(b) shows the voltage waveforms of the entire opera-
tion for a representative input case. The output characteristics
of the proposed three-input majority gate for all possible com-
binations of inputs are shown in Fig. 4(a)-(h). The maximum
and the average read energy required to perform the logic
operation is 1.9 pJ and 0.577 pJ, respectively.
IV. XOR/XNOR IMPLEMENTATION
The truth table of a two-input XOR gate is shown in Fig.
5(a). We propose a novel input-to-voltage mapping scheme
(Fig. 5(c)) where the inputs A and B are divided into sub-
inputs A1-A2and B1-B2, respectively, to implement the XOR
gate using two Fe-FDSOI FETs. While the sub-inputs A1and
A2are applied to the gate terminals of the Fe-FDSOI FETs,
the sub-inputs B1and B2are applied to the drain terminals
as shown in Fig. 5(b). A logic ‘0’ for input A is encoded by
applying pulses of 0 V and 5 V amplitude with a duration
of 1µs at the gate terminals A1and A2, respectively, while
logic level ‘1’ is encoded by applying pulses with 5 V and 0
V amplitude and duration of 1µs at A1and A2, respectively.
Similarly, the logic ’0’ for input B is encoded by applying
pulses of 0 V and 5 V amplitude with a duration of 1µs at the
drain terminals B1and B2, respectively, while logic level ‘1’
is encoded by applying pulses with 5 V and 0 V amplitude and
a duration of 1µs at B1and B2, respectively. For reading the
output, pulses with amplitude of 0.1 V and 0.05 V are applied
at the gate and drain terminals, respectively, while keeping the
source terminal grounded. The output C is encoded as logic
‘0’ or ‘1’ depending on the current flowing through the source
terminal common to the two Fe-FDSOI FETs. When the inputs
are different (A = 0/1 and B = 1/0 ), the novel input-to-voltage
mapping scheme leads to the application of a 5 V pulse to the
gate terminal of one of the Fe-FDSOI FETs while keeping its
drain terminal grounded. This results in programming of the
Fe-FDSOI FET to the low VT H state and a consequent high
current through the source terminal leading to output logic ’1’.
On the other hand, when the inputs are same (A = 0/1 and B
= 0/1), both Fe-FDSOI FETs maintain their initial reset-state
since 0 V is applied to the gate and drain terminals of one
Fe-FFDSOI FET while 5 V is applied to the gate and drain
terminals of the other FE-FDSOI FET. The voltage waveforms
for the proposed XOR implementation for all possible input
combinations are shown in Figs. 6(a)-(h).
The truth table of a two-input XNOR gate is shown in Fig.
7(a). We propose a similar input-to-voltage mapping scheme
This article has been accepted for publication in IEEE Journal of the Electron Devices Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JEDS.2024.3497147
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
JOURNAL OF L
A
T
E
X CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 3
Fig. 4. Output of the proposed majority gate for all possible input combinations are shown in (a)-(h). Sufficient separation between the output current levels
corresponding to logic ‘0’ and logic ‘1’ (>104) is observed.
A
1
A
2
A=0
0
V
5
V
A=1
5
V
0
V
B
1
B
2
B=0
0 V
5 V
B=1
5 V
0 V
Input B
Output
C
0
0
1
1
0
1
1
0
B2
A2
A1
B1
C
Iout
(a) (b) (c)
Fig. 5. (a) The truth table and (b) the proposed circuit for the XOR gate.
Input A is divided into A1and A2, while Input B is divided into B1and
B2. (c) Input-to-voltage mapping of the proposed XOR circuit.
for FeFET-based XNOR gate implementation where inputs
A and B are divided into A1-A2and B1-B2, respectively,
as shown in Fig. 7(b). The entire operation of the proposed
XNOR gate is similar to the proposed XOR gate with only a
difference of input to voltage encodings in A1-A2.
The voltage and output current waveforms for different
input combinations are shown in Figs. 8(a)-(e). For imple-
menting the proposed input-to-voltage mapping, an additional
inverter is required for providing complementary inputs which
increases the overall transistor count. Moreover, while the
inputs are encoded in the voltage domain in the proposed logic
design, the output is encoded in the current domain. There-
fore, for driving the next stage i.e. for cascading, the output
current levels of the preceding stage need to be converted to
appropriate voltage levels before providing it as input to the
next stage. Hence, the proposed implementation necessitates
peripheral read-out circuits that utilize sense amplifiers similar
to the prior processing-in-memory implementations utilizing
emerging non-volatile memories [12]–[14]. To achieve this,
a clocked pull-up p-type FET may be connected between the
supply and output node [8] or a simple read-out circuitry using
an inverting amplifier and a buffer can be used [15]. Fur-
thermore, the readout circuitry can be shared among various
logic functions in array architecture leading to a more efficient
design while maintaining appreciable performance levels [15],
[2]. Furthermore, all the logic-in-memory operations are per-
formed in a pulsed-manner by providing the inputs as pulses
at the gate/drain terminals with 1µs duration and a rise and
fall time of 100 ns. The application of inputs perturbs the
polarization-state of the FeFET which is measured by sensing
the output drain current. The energy consumption of the
proposed FeFET-based XOR and XNOR gate implementations
are also evaluated for different input combinations in Table I.
The FeFET-based XOR/XNOR gates consume a significantly
low (worst case) energy of 2.33 nJ.
We have used a gate and drain voltage of Vg=Vd=5Vin
our proof-of-concept demonstration to ensure a large separa-
tion between the output (drain) current values corresponding to
logic ‘0’ and logic ‘1’ while keeping the output drain current
corresponding to logic ‘1’ at an appreciable level which may
be easily detected by a simple sense amplifier. These voltage
levels can be reduced significantly by using dual-port FeFETs
where the read operation can be performed using the back
gate [16] or using novel design techniques such as the use
of Ge-channel [17]. Moreover, programming/erase voltages
above 5 V have been utilized in scaled FeFETs for realizing a
high-speed operation and a large memory window [18], [19].
Furthermore, all the logic-in-memory operations are performed
in a pulsed-manner by providing the inputs as pulses at the
gate/drain terminals with 1µs duration and a rise and fall time
of 100 ns. The application of inputs perturbs the polarization-
state of the FeFET which is measured by sensing the output
drain current. V. ADD ER IMPLEMENTATIONS
A. Half Adder
The truth table for a half-adder is shown in Fig. 9(a) and
the corresponding implementations of the sum (S= AB) and
carry bit (Co= A.B) utilizing the Fe-FDSOI FETs are shown in
Fig. 9(b). The proposed half adder implementation consists of
three Fe-FDSOI FETs: while two Fe-FDSOI FETs are required
to compute the sum (XOR operation), the third Fe-FDSOI FET
is used to compute the carry bit by performing AND operation
(parallelly). While the operation of the Fe-FDSOI FET-based
XOR gate has been discussed in section III, the AND gate
operation can also be performed utilizing a single Fe-FDSOI
FET exploiting the drain-erase scheme with a novel input-
to-voltage mapping scheme. For AND gate operation, input
A is encoded as logic level ‘0’ (‘1’) by applying a pulse of
magnitude 0 V (5 V) with 1µs duration at the gate terminal
while input B is encoded as logic level ‘0’ (‘1’) by applying
a pulse of magnitude 5 V (0 V) and 1µs duration at the drain
terminal. The drain current represents the output which is read
by applying a gate voltage of 0.1 V and a drain voltage of 0.05
This article has been accepted for publication in IEEE Journal of the Electron Devices Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JEDS.2024.3497147
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
JOURNAL OF L
A
T
E
X CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 4
Fig. 6. Input-to-voltage mapping scheme for the proposed XOR circuit for all possible input combinations are shown in (a)-(d) and their corresponding output
waveforms are shown in (e)-(h). TABLE I
REA D AN D WRI TE EN ER GY O F TWO -INPUT XOR AN D XNOR GATE FOR DIFFERENT INPUT COMBINATIONS
Logic Inputs A=0 and B=0 Inputs A=0 and B=1 Inputs A=1 and B=0 Inputs A=1 and B=1
Write
Energy
Read
Energy
Total
Energy
Write
Energy
Read
Energy
Total
Energy
Write
Energy
Read
Energy
Total
Energy
Write
Energy
Read
Energy
Total
Energy
XOR 2.33 nJ 31.9 aJ 2.33 nJ 0.213 nJ 48.5 fJ 0.213 nJ 0.213 nJ 48.5 nJ 0.213 nJ 2.33 nJ 31.9 aJ 2.33 nJ
XNOR 0.213 nJ 48.5 fJ 0.213 nJ 2.33 nJ 31.9 aJ 2.33nJ 2.33 nJ 31.9 aJ 2.33nJ 0.213 nJ 48.5 fJ 0.213 nJ
A
1
A
2
A=0
5
V
0
V
A=1
0
V
5
V
B
1
B
2
B=0
0 V
5 V
B=1
5 V
0 V
Input A
Input B
Output
C
0
0
1
0
1
0
1
0
0
1
1
1
B2
A2
A1
B1
C
Iout
(a) (b) (c)
Fig. 7. (a) The truth table and (b) the proposed circuit for the XNOR gate.
Input A is divided into A1and A2, while Input B is divided into B1and
B2. (c) Input-to-voltage mapping for the proposed XNOR circuit.
V. The novel input-to-voltage mapping scheme for the AND
gate ensures that the Fe-FDSOI FET gets programmed to the
low VT H state (and exhibits a high drain current) only for
input case A = B = 1 (5 V at gate and 0 V at drain terminal)
and shows negligible drain current for other combination of
inputs. Fig. 9(c) and Fig. 9(d) show the output waveforms
for the sum and the carry block, respectively, for different
input combinations. The proposed half-adder implementation
consumes a maximum energy of 96 fJ during the operation.
B. Full Adder
The truth table for a 1-bit full adder is shown in Fig. 10(a).
The general expression for the sum of a 1-bit full adder can
be given as:
SU M =Ci(AB)
=Ci(AB+AB) + C
i(AB+AB)(2)
Utilizing the proposed FeFET-based XOR and XNOR gates,
we propose a compact two-phase implementation of the SUM
expression which consists of four Fe-FDSOI FETs and two
FDSOI FETs as shown in Fig. 10(b). While the four Fe-FDSOI
FETs are required to perform the XOR and XNOR operation
between the inputs A and B, the outputs of the XOR and
XNOR block are applied to the drain terminal of two FDSOI
FETs M1and M2for AND operation with inputs C
iand Ci,
respectively. The input Ciis divided into Ci1and Ci2and
encoded as logic level ‘0’ (‘1’) by applying a 1 V (0 V) and
0 V (1 V) pulse of duration 1µs at the gate terminals of M1
(representing Ci1) and M2(representing Ci2), respectively,
as shown in Fig. 10(c). The entire process of computing the
SUM bit is divided into two phases: the results of the XOR
and XNOR operations are obtained in the first phase while the
input Ciis applied as Ci1and Ci2and the output is read in
the second phase as shown in Fig. 11.
Now, the carry operation of the 1-bit full adder represented
by:
Cout =AB+BC
i+ACi(3)
can be performed by utilizing the proposed AND gate
implementation with three Fe-FDSOI FETs as shown in Fig.
12(a). The input-to-voltage encoding scheme for the carry
operation are listed in Fig. 12(b). Similar to the SUM opera-
tion, the carry computation is also divided into two phases:
the inputs A, B, and Ciare applied in phase I for AND
operation while read biases are applied in the second phase to
obtain the carry bit. The output waveforms for different input
combinations are shown in Fig. 13. The sum and carry bit of
the proposed 1-bit full adder are computed at the same time.
The total energy consumption for the proposed full Adder
for all possible input cases are reported in Table II. As can
be observed from Fig. 8(e), while the output of the XNOR
gate shows consistent currents for logic ‘0’ and logic ‘1’,
the current levels for output logic ‘0’ and logic ‘1’ for the
carry operation in the full adder implementation is different for
different input combinations as shown in Fig. 13(b). Therefore,
to ensure consistency in the proposed logic design, we use 100
nA as the reference current level for the readout circuitry and
classify the output current level above 100 nA as logic ‘1’ and
This article has been accepted for publication in IEEE Journal of the Electron Devices Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JEDS.2024.3497147
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
JOURNAL OF L
A
T
E
X CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 5
(a) (b) (c) (d) (e)
Fig. 8. Input-to-voltage mapping for the proposed XNOR circuit for all possible input combinations are shown in (a)-(d). (e) Output waveforms with current
in log scale for XNOR gate with 100 nA as reference.
Fig. 9. (a) Truth table and (b) proposed circuit for sum and carry operation
in the half adder. (c) Input voltage and (d) output current waveforms for all
combination of inputs for the proposed half adder.
output current less than 100 nA as logic ‘0’. Moreover, we
have benchmarked the performance of the proposed full adder
against the recent stand-alone PIM implementations based on
eNVMs in Table III. The proposed FeFET-based full adder
implementation requires only 9 transistors and exhibits the
least footprint as compared to the prior implementations. The
proposed logic design methodology can be extended to logic-
in-memory array architectures with shared readout circuitry
such as [20]. The output of the logic functions encoded as
the polarization-state of the FeFETs can be sensed using
readout circuitry and then applied as next set of inputs to
the array. Moreover, logic operations with a common input
can be performed simultaneously by applying the common
input to the word lines (gate terminals) while applying the
other inputs to different bit lines (drain terminals) in an AND-
array architecture. Extensive array-level design analysis and
exploration of the proposed FeFET-based logic-in-memory
implementation is an important future work. Similar to the
prior works on FeFETs [8], [21], our in-house developed
compact model also exhibits a switching time of 1 µs.
Furthermore, we have used rise/fall times of 100 ns, input/read
pulse widths of 1 µs, an initial reset pulse duration of 1 µs
and applied the inputs 1 µs after the application of the reset
pulse. Also, the read operation for determining the output of
the logic gate is performed 500 ns after the application of
the input pulses to account for the read-after-write delay in
FeFETs caused by parasitic charge trapping [22]. Therefore,
the proof-of-concept demonstration of the proposed FeFET-
based logic-in-memory implementation takes 4.8 µs. However,
many recent works have used a pulse width less than 300 ns
TABLE II
REA D ENERGY CONSUMPTION OF 1-BIT FUL L ADD ER FO R DI FFE RE NT
INP UT COMBINATIONS
Inputs Read Energy
for SuM
Read Energy
for Carry Total Read Energy
000 0.086 aJ 0.1195 aJ 0.2055 aJ
001 122.5 fJ 2.34 fJ 124.84 fJ
010 122.5 fJ 2.34 fJ 124.84 fJ
011 0.086 aJ 156 fJ 156 fJ
100 122.5 fJ 2.34 fJ 124.84 fJ
101 0.086 aJ 156 fJ 156 fJ
110 0.086 aJ 156 fJ 156 fJ
111 122.5 aJ 400 fJ 522.5 fJ
to switch between the extreme polarization-states in FeFETs
[18], [23]. Furthermore, recently, a switching time of 300 ps
for a pulse amplitude of 4.5 V was shown experimentally
utilizing proper impedance matching technique and dedicated
RF probes [24]. It was also reported that the switching time
of 300 ps is limited by the measurement capabilities, and
the minimum switching time should be 1-10 ps based on
the nucleation propagation theory [24]. Moreover, by using
SiNx as an interfacial layer, which results in negligible charge
trapping [25] and also by applying an appropriate standby bias
[26], the read-after-write delay can be substantially reduced
to 10 ns. Therefore, the latency of the proposed FeFET-based
logic-in-memory implementation can be reduced significantly.
VI. XNOR-BITCOUNT BAS ED B NN ACC ELE RATOR
A. Implementation
Although the deep neural networks (DNNs) and convolu-
tional neural networks (CNNs) show unprecedented accuracy
surpassing the human capabilities for tasks such as image
classification, natural language processing, and speech recog-
nition [27], they are compute- and resource-intensive owing
to the huge number of multiply and accumulate operations
(MAC) [28]. This restricts their deployment on the resource-
constrained IoT edge devices with conventional von-Neumann
computing engines. To reduce the computational complexity of
the DNNs and CNNs and enable their deployment on IoT edge
devices with limited area and energy, network quantization
techniques where weights and activations are restricted to
two (binary) values were proposed recently [29], [30]. This
quantization technique in the binary neural networks facili-
tates simplified execution of the computationally demanding
multiply and accumulate (MAC) operations via bitwise count-
ing (which involves XNOR+POPcount). [30]. These XNOR-
bitcount based BNNs exhibit a significant reduction in the
computational workload and the memory for storing parame-
ters without compromising significantly with the accuracy. The
This article has been accepted for publication in IEEE Journal of the Electron Devices Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JEDS.2024.3497147
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
JOURNAL OF L
A
T
E
X CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 6
Fig. 10. (a) Truth table and (b) proposed circuit for the full adder sum block. XOR and XNOR blocks used in full adder sum block are marked by pink and
green dashed boundaries. (c) The input-to-voltage mapping scheme for Ciinput.
Fig. 11. (a) Representation of voltage waveforms of two phases for input A=1, B=0 and Ci=0 for the proposed full adder sum block. Output current waveforms
for all possible input combinations are shown in (b)-(i)
Fig. 12. (a) Circuit for carry block of the proposed full adder and (b) the
input-to-voltage encodings utilized for its implementation.
energy-efficiency and the performance of these BNNs may be
further improved by utilizing in-memory computing architec-
ture based on eNVMs specifically tailored for XNOR Neural
Networks (XNNs). To this end, in this work, we also propose
a FeFET-based XNOR cell for compact and energy-efficient
BNN accelerators. The compact and energy-efficient FeFET-
based XNOR gate implementation can be used to perform
the XNOR operation of the weights and inputs. However, the
polarization-state of the Fe-FDSOI FETs changes according to
the XNOR output and the information regarding the weights
is lost. Therefore, the weights need to be fetched again for
the XNOR operation with the next set of inputs during the
inference leading to additional energy-hungry data movements.
Although the weights can be stored in a SRAM cell and
fetched whenever required for XNOR operation similar to
[31], the transistor count increases considerably for one cell
TABLE III
PER FO RM AN CE BENCHMARKING OF 1-B IT FU LL A DD ER
Device Technology No. of devices used Read Energy
(or power consumed)
FeFET [8]20 FETs (7FeFETs
and 14 FETs)
avg: 1.42 fJ
min: 0.42 fJ
FeFET [32]32 FETs (28 FETs
and 4 FETs ) 0.54 fJ (0.27 µW)
FeFET [32]20 FETs (17 FETs
and 3 FeFETs) 0.42 fJ (0.21 µW)
STT+SHE1[33] 52 FETs and 4 MTJs 1.23 fJ
SHE-MTJ2[34] 23 FETs and 3 SHEs 40.68 fJ (13.6 µW )
Memristor [35] 9 memristors36.4 nJ
CMOS[36]>14 FETs 60.6 µW
Dynamic CMOS[37] 16 FETs 2.14 µW
Reconfigurable FETs4[38] 8 RFETs NA
This Work59 FETs (7 FeFETs
and 2 FETs)
avg: 170.62 fJ (0.17 µW )
min: <aJ
1STT: Spin-transfer torque, SHE: Spin-Hall effect
2MTJ: Magnetic Tunnel Junction
3Requires 43 steps
4Additional gates needed to drive the transmission gates are neglected
5Inverter count not included
Overhead of data transfer not included
of the XNN array degrading the area-efficiency of the BNN
accelerator.
To address this bottleneck, we have proposed a modified
structure of XNOR cell specifically tailored for XNNs as
shown in Fig. 14(a). It consists of four Fe-FDSOI FETs
where two (computing) Fe-FDSOI FETs (C1 and C2) are
used for performing XNOR operation while the remaining two
(memory) Fe-FDSOI FETs (M1and M2) are used to store
weights and eliminate the weight fetching step and the related
This article has been accepted for publication in IEEE Journal of the Electron Devices Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JEDS.2024.3497147
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
JOURNAL OF L
A
T
E
X CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 7
(a) (b)
Fig. 13. (a) Representation of input voltage waveforms for input A=1 B=0
and Ci=0 for the full adder carry block. Output waveforms with current in
log scale for carry operation of the full-adder are shown in (b).
Fig. 14. (a) Schematic of the circuit for XNOR cell in XNN. (b) Read
characteristics of XNOR cell.
energy-hungry data movements. First, the binary weight is
programmed as the polarization-state of M1and M2. To store
weight ‘0’, a positive programming pulse of duration 50 µs
is applied at the gate terminal of M1, and the gate terminal
of M2is grounded. For storing weight ‘1’, the gate terminal
of M1is grounded, and a positive programming pulse of
duration 50 µs is applied at the gate terminal of M2. Now
for performing the XNOR operation, the input is divided into
two sub-inputs: I1and I2and the logic ‘0’ (‘1’) for input I
is encoded by applying pulses of 0 V (5 V) and 5 V (0 V)
amplitude with a duration of 1µs at the drain terminals I1and
I2, respectively. The result of the XNOR operation is stored
as the polarization state of the FE layer in the gate stack of
C1 and C2 while the weights are preserved in the memory
Fe-FDSOI FETs (M1and M2) for inference operation with
next set of inputs. For obtaining the output, appropriate read
voltages are applied to all four Fe-FDSOI FETs. Fig. 14(b)
shows the read characteristics of the proposed XNOR cell of
XNN, where the output drain current is high when both the
inputs are same while the output drain current is low when
the inputs are different.
B. Impact of Weight Flipping
Ferroelectric FETs are also known to exhibit unique spatial
(device-to-device) variation due to the process non-uniformity.
Therefore, application of the same programming pulse to
different FeFETs in the proposed XNOR-cell array may result
in different polarization-states. As a result, some of the binary
weights may not be correctly programmed to their desired
polarization-state resulting in flipped weights during the infer-
ence operation in XNNs. We have performed a comprehensive
analysis of the impact of weight flipping in the proposed
FeFET-based XNN by carrying out extensive network level
simulations.
The architecture used to perform the network-level simula-
tions of the proposed XNOR neural network is shown in Fig.
15. This architecture is an adaptation of the XNOR-Net [30]
Fig. 15. Architecture of neural network used in network-level simulations
Fig. 16. Impact of weight flipping in binary Conv2d and Linear layers on
accuracy.
integrated into a modified LeNet5 framework. It consists of
two convolutional layers (one binary) and two linear layers
(one binary). The MAC operations in the binary layers were
performed using bitwise count operations (XNOR+POPcount)
for binary weights and activations [30]. Network-level simu-
lations of the proposed XNOR-based BNNs without weight
flipping indicates that a 99.21% accuracy can be achieved
on the MNIST dataset. Fig. 16 shows the impact of weight
flipping in the binary convolutional and linear layers on the
accuracy. Flipping of weights in both binary convolutional and
binary linear layers leads to a significant degradation in the
accuracy of the network. The accuracy degrades to 89% when
20% of the weights in both binary convolutional and binary
linear layers are randomly flipped. We have also analyzed the
sensitivity of the network accuracy to the weight flipping in the
binary convolutional layer and binary linear layer separately.
The network accuracy is more sensitive to the weight flipping
in the convolutional layer as compared to the weight flipping
in the linear layer, as shown in Fig. 16. Also, weight flipping
in individual layers (Linear or Conv2d) only does not degrade
the network accuracy significantly.
VII. CONCLUSION
In this work, utilizing novel input-to-voltage mapping
schemes, we have proposed compact implementations of logic
circuits such as 3-input majority gate (using one FeFET), XOR
and XNOR gates (using only two FeFETs), half adder (using
only 3 FeFETs) and full adder (using only 9 FETs) which
outperform the prior implementations in terms of area and
energy. We have also proposed a novel FeFET-based XNOR-
based cell that eliminates the weight fetching step and the
related energy-hungry data movements while performing bit-
wise count operation in BNNs. The impact of weight flipping
on the accuracy of image classification of the MNIST data
set was also evaluated in the proposed FeFET-based XNOR
Neural Network. We believe that our promising results may
This article has been accepted for publication in IEEE Journal of the Electron Devices Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JEDS.2024.3497147
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
JOURNAL OF L
A
T
E
X CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 8
provide the incentive for the experimental realization of area
and energy-efficient in-memory Boolean logic circuits using
ferroelectric FETs in this nascent era of computing.
REFERENCES
[1] J. Lee, B.-G. Park, and Y. Kim, “Implementation of boolean logic
functions in charge trap flash for in-memory computing,” IEEE
Electron Device Letters, vol. 40, no. 9, pp. 1358–1361, 2019,
doi:10.1109/LED.2019.2928335.
[2] Z. Yang, K. Pan, N. Y. Zhou, and L. Wei, “Scalable 2t2r logic
computation structure: Design from digital logic circuits to 3-d stacked
memory arrays,” IEEE Journal on Exploratory Solid-State Com-
putational Devices and Circuits, vol. 8, no. 2, pp. 84–92, 2022,
doi:10.1109/JXCDC.2022.3206778.
[3] H. Mulaosmanovic et al., “Ferroelectric field-effect transistors based on
HfO2: a review,” Nanotechnology, vol. 32, no. 50, p. 502002, sep 2021,
doi:10.1088/1361-6528/ac189f.
[4] M. Rafiq, T. Kaur, A. Gaidhane, Y. S. Chauhan, and S. Sahay, “Fer-
roelectric fet-based time-mode multiply-accumulate accelerator: Design
and analysis,” IEEE Transactions on Electron Devices, vol. 70, no. 12,
pp. 6613–6621, 2023, doi:10.1109/TED.2023.3323261.
[5] M. Rafiq, S. Chatterjee, S. Kumar, Y. Singh Chauhan, and S. Sahay,
“Utilizing dual-port fefets for energy-efficient binary neural network
inference accelerators,” IEEE Transactions on Electron Devices, vol. 71,
no. 7, pp. 4381–4388, 2024, doi:10.1109/TED.2024.3405472.
[6] K. Ni, M. Jerry, J. A. Smith, and S. Datta, A circuit compatible accurate
compact model for ferroelectric-fets,” in 2018 IEEE Symposium on VLSI
Technology, 2018, pp. 131–132, doi: 10.1109/VLSIT.2018.8510622.
[7] M. Rafiq, Y. S. Chauhan, and S. Sahay, “Exploiting single ferro-
electric fet for efficient implementation of majority gate function
for approximate computing,” in 2024 8th IEEE Electron Devices
Technology & Manufacturing Conference (EDTM), 2024, pp. 1–3,
doi:10.1109/EDTM58488.2024.10511629.
[8] E. T. Breyer et al., “Compact fefet circuit building blocks
for fast and efficient nonvolatile logic-in-memory,” IEEE Jour-
nal of the Electron Devices Society, vol. 8, pp. 748–756, 2020,
doi:10.1109/JEDS.2020.2987084.
[9] A. D. Gaidhane, R. Dangi, S. Sahay, A. Verma, and Y. S. Chauhan,
“A computationally efficient compact model for ferroelectric switching
with asymmetric non-periodic input signals,” IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Systems, pp. 1–1,
2022, doi:10.1109/TCAD.2022.3203956.
[10] S. Chatterjee et al., “Ferroelectric fdsoi fet modeling for memory and
logic applications,” Solid-State Electronics, vol. 200, p. 108554, 2023,
doi:j.sse.2022.108554.
[11] P. Wang, Z. Wang, W. Shim, J. Hur, S. Datta, A. I. Khan, and
S. Yu, “Drain–erase scheme in ferroelectric field-effect transistor—part
i: Device characterization, IEEE Transactions on Electron Devices,
vol. 67, no. 3, pp. 955–961, 2020, doi: 10.1109/TED.2020.2969401.
[12] E. Linn, R. Rosezin, S. Tappertzhofen, U. B¨
ottger, and R. Waser,
“Beyond von neumann—logic operations in passive crossbar arrays
alongside memory operations,” Nanotechnology, vol. 23, no. 30, p.
305205, jul 2012, doi:10.1088/0957-4484/23/30/305205.
[13] A. Siemon et al., “Realization of boolean logic functionality us-
ing redox-based memristive devices, Advanced Functional Materials,
vol. 25, no. 40, pp. 6414–6423, doi:10.1002/adfm.201500865.
[14] S. Gao et al., “Implementation of complete boolean logic functions in
single complementary resistive switch, Scientific Reports, vol. 5, no. 1,
p. 15467, Oct 2015, doi:10.1038/srep15467.
[15] Z.-R. Wang et al., “Functionally complete boolean logic in 1t1r resistive
random access memory, IEEE Electron Device Letters, vol. 38, no. 2,
pp. 179–182, 2017, doi:10.1109/LED.2016.2645946.
[16] S. Chatterjee, S. Thomann, K. Ni, Y. S. Chauhan, and H. Amrouch,
“Comprehensive variability analysis in dual-port fefet for reliable multi-
level-cell storage, IEEE Transactions on Electron Devices, vol. 69,
no. 9, pp. 5316–5323, 2022, doi:10.1109/TED.2022.3192808.
[17] D. Das et al., “A ge-channel ferroelectric field effect transistor with
logic-compatible write voltage,” IEEE Electron Device Letters, vol. 44,
no. 2, pp. 257–260, 2023, doi:10.1109/LED.2022.3231123.
[18] S.-C. Yan et al., “Multilevel cell ferroelectric hfzro finfet with
high speed and large memory window using alon interfacial layer,
IEEE Electron Device Letters, vol. 44, no. 1, pp. 44–47, 2023,
doi:10.1109/LED.2022.3224949.
[19] S. De et al., “Random and systematic variation in nanoscale
hf0.5zr0.5o2 ferroelectric finfets: Physical origin and neuromorphic
circuit implications,” Frontiers in Nanotechnology, vol. 3, 2022,
doi:10.3389/fnano.2021.826232.
[20] B. Wu, H. Zhu, K. Chen, C. Yan, and W. Liu, “Mlim: High-performance
magnetic logic in-memory scheme with unipolar switching sot-mram,”
IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 70,
no. 6, pp. 2412–2424, 2023, doi:10.1109/TCSI.2023.3254607.
[21] H. Mulaosmanovic, D. Kleimaier, S. D¨
unkel, S. Beyer, T. Mikolajick,
and S. Slesazeck, “Ferroelectric transistors with asymmetric double gate
for memory window exceeding 12 v and disturb-free read, Nanoscale,
vol. 13, pp. 16 258–16 266, 2021, doi: 10.1039/D1NR05107E.
[22] E. Yurchuk et al., “Charge-trapping phenomena in hfo2-based fefet-type
nonvolatile memories, IEEE Transactions on Electron Devices, vol. 63,
no. 9, pp. 3501–3507, 2016, doi: 10.1109/TED.2016.2588439.
[23] T. Ali et al., A multilevel fefet memory device based on laminated
hso and hzo ferroelectric layers for high-density storage,” in 2019
IEEE International Electron Devices Meeting (IEDM), 2019, pp. 28.7.1–
28.7.4, doi:10.1109/IEDM19573.2019.8993642.
[24] M. M. Dahan et al., “Sub-nanosecond switching of si:hfo2 ferroelectric
field-effect transistor, Nano Letters, vol. 23, no. 4, pp. 1395–1400, 2023,
doi:10.1021/acs.nanolett.2c04706.
[25] M. Hoffmann et al., “Fast read-after-write and depolarization fields in
high endurance n-type ferroelectric fets,” IEEE Electron Device Letters,
vol. 43, no. 5, pp. 717–720, 2022, doi:10.1109/LED.2022.3163354.
[26] Z. Wang et al., “Standby bias improvement of read after write de-
lay in ferroelectric field effect transistors,” in 2021 IEEE Interna-
tional Electron Devices Meeting (IEDM), 2021, pp. 19.3.1–19.3.4,
doi:10.1109/IEDM19574.2021.9720502.
[27] Y. LeCun, K. Kavukcuoglu, and C. Farabet, “Convolutional networks
and applications in vision,” in Proceedings of 2010 IEEE Interna-
tional Symposium on Circuits and Systems, 2010, pp. 253–256, doi:
10.1109/ISCAS.2010.5537907.
[28] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classifica-
tion with deep convolutional neural networks, in Advances in Neural
Information Processing Systems, F. Pereira, C. Burges, L. Bottou, and
K. Weinberger, Eds., vol. 25. Curran Associates, Inc., 2012.
[29] M. Courbariaux and Y. Bengio, “Binarynet: Training deep neural
networks with weights and activations constrained to +1 or -
1,” CoRR, vol. abs/1602.02830, 2016. [Online]. Available: http:
//arxiv.org/abs/1602.02830
[30] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “Xnor-net:
Imagenet classification using binary convolutional neural networks,
CoRR, vol. abs/1603.05279, 2016. [Online]. Available: http://arxiv.org/
abs/1603.05279
[31] S. Yin, Z. Jiang, J.-S. Seo, and M. Seok, “Xnor-sram: In-memory
computing sram macro for binary/ternary deep neural networks,” IEEE
Journal of Solid-State Circuits, vol. 55, no. 6, pp. 1733–1743, 2020, doi:
10.1109/JSSC.2019.2963616.
[32] X. Yin, A. Aziz, J. Nahas, S. Datta, S. Gupta, M. Niemier,
and X. S. Hu, “Exploiting ferroelectric fets for low-power non-
volatile logic-in-memory circuits,” in 2016 IEEE/ACM International
Conference on Computer-Aided Design (ICCAD), 2016, pp. 1–8,
doi:10.1145/2966986.2967037.
[33] E. Deng et al., “High-frequency low-power magnetic full-adder
based on magnetic tunnel junction with spin-hall assistance,” IEEE
Transactions on Magnetics, vol. 51, no. 11, pp. 1–4, 2015,
doi:10.1109/TMAG.2015.2449554.
[34] A. Roohi, R. Zand, D. Fan, and R. F. DeMara, “Voltage-based concate-
natable full adder using spin hall effect switching, IEEE Transactions
on Computer-Aided Design of Integrated Circuits and Systems, vol. 36,
no. 12, pp. 2134–2138, 2017, doi:10.1109/TCAD.2017.2661800.
[35] F. M. Puglisi, L. Pacchioni, N. Zagni, and P. Pavan, “Energy-
efficient logic-in-memory i-bit full adder enabled by a physics-
based rram compact model,” in 2018 48th European Solid-
State Device Research Conference (ESSDERC), 2018, pp. 50–53,
doi:10.1109/ESSDERC.2018.8486886.
[36] M. Aguirre-Hernandez and M. Linares-Aranda, “Cmos full-adders for
energy-efficient arithmetic applications, IEEE Transactions on Very
Large Scale Integration (VLSI) Systems, vol. 19, no. 4, pp. 718–721,
2011, doi:10.1109/TVLSI.2009.2038166.
[37] S. Kumar, S. Chatterjee, C. K. Dabhi, Y. S. Chauhan, and H. Am-
rouch, “Nontraditional design of dynamic logics using fdsoi for ultra-
efficient computing, IEEE Journal on Exploratory Solid-State Com-
putational Devices and Circuits, vol. 9, no. 1, pp. 74–82, 2023,
doi:10.1109/JXCDC.2023.3269141.
[38] L. Wind, M. Maierhofer, A. Fuchsberger, M. Sistani, and W. M. Weber,
“Realization of a complementary full adder based on reconfigurable
transistors,” IEEE Electron Device Letters, vol. 45, no. 4, pp. 724–727,
2024, doi:10.1109/LED.2024.3368110.
This article has been accepted for publication in IEEE Journal of the Electron Devices Society. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/JEDS.2024.3497147
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Fine grain reconfigurability carried out at the transistor level, i.e. the ability to switch between n- and p-type operation, offers new possibilities for highly efficient logic gates. In particular, XOR- and Majority gate circuit implementations can considerably benefit from reconfigurable transistors, as they require less than half of the transistor count needed in conventional static CMOS technology. Using a total of eight highly on-state symmetric reconfigurable field effect transistors fabricated from monolithic Al-Si heterostructures, we experimentally demonstrate a fully functional full adder, a fundamental circuit for many arithmetic applications. The two slightly adapted reconfigurable XOR gates for sum and carry output provide a full output voltage swing using only a single symmetric supply rail, while achieving very low static power consumption due to complementary circuit design and inherent leakage suppression of the devices. Furthermore, their stable operation against input voltage variations is demonstrated with static and transient measurements.
Article
Full-text available
In this paper, we propose a non-traditional design of dynamic logic circuits using Fully-Depleted Silicon on Insulator (FDSOI) FETs. FDSOI FET allows the threshold voltage ( Vt ) to be adjustable (i.e., low-Vt and high-Vt states) by using the back gate bias. Our design utilizes the front and back gates of an FDSOI FET as the input terminals and proposes the dynamic logic gates (like; NAND, NOR, AND, OR, XOR, and XNOR) and circuits (like; half adder and full adder). It requires fewer transistors to build dynamic logic gates and achieves high performance with low power dissipation compared to conventional dynamic logic designs. The compact industrial model of FDSOI FET (BSIM-IMG) has been used to simulate dynamic logic gates and is fully calibrated to reproduce the 14nm FDSOI FET technology node data. Calibration is performed for both electrical characteristics and process variations. The simulation results show an average improvement in transistor count, propagation delay, power, and power-delay product of 23.43%, 57.16%, 47.05%, and 77.29%, respectively, compared to the conventional designs. Further, our design reduces the charge sharing effect, which affects the drivability of the dynamic logic gates. In addition, we have analyzed the impact of the process, supply voltage, and load capacitance variations on the propagation delay of the dynamic logic family in detail. The results show that these variations have a minor impact on the propagation delay of the proposed FDSOI-based dynamic logic gates compared to the conventional dynamic logic gates.
Article
Neuromorphic and in-memory computing architectures using emerging nonvolatile memories (e-NVMs) have emerged as promising solutions for area-and energy-efficient deep neural network (DNN) accele-rators. However, the inherent nonideal behavior of e-NVMs such as limited tuning precision (for multibit synapses), nonlinearity, temporal (cycle-to-cycle), and spatial (device-to-device) variability significantly degrades the performance of DNN accelerators. Recently, binary neural networks (BNNs), with 1-bit weights and activations, have been shown to offer an alternative relaxed approach for training and inference with high accuracy. However, the limited endurance and stuck-at faults of e-NVMs such as resistive (R)RAMs, charge trap memory, and so on limit the efficient implementation of BNN accelerators. Considering the ultrahigh endurance, ultralow switching energy, and CMOS-compatibility of the ferroelectric (Fe)FETs, it becomes imperative to explore their potential for BNN accelerators. To this end, in this work, we present a novel approach for the implementation of BNN inference accelerators utilizing an array of dual-port ferroelectric FETs (FeFETs) as current sinks. The dual-port FeFETs not only decouple the read and write paths (leading to reduced read disturbances) but also exhibit high reliability and voltage compatibility with existing peripheral circuit design since the write voltages are low. Furthermore, we utilize a comparator column approach that requires only half area when compared to other differential weights-based BNN accelerators. Our comprehensive analysis utilizing an experimentally calibrated compact model for dual-port FeFETs indicates that the proposed vector-matrix-multiplication (VMM) implementation exhibits an energy efficiency of 789.89 TOPS/W with a throughput of 0.06 TeraOp/s while achieving an accuracy of 96.39% and 80.8% for image classification task on the MNIST and Fashion MNIST datasets after ex-situ training.
Article
General-purpose multiply-accumulate (MAC) accelerators have become inevitable in the internet-of-things (IoT) edge devices for performing computationally intensive tasks such as deep learning, signal processing, and combinatorial optimization. The throughput and the energy-efficiency of the conventional digital processors and MAC accelerators are limited due to their sparse design owing to the von-Neumann architecture. Although mixed-signal time-mode MAC accelerators utilizing emerging non-volatile memories appear promising owing to their ability to perform in-memory MAC operation via the physical laws, their application is limited due to their incompatibility and complex integration with the CMOS process, high sensitivity to process variations, large operating voltage/cell currents, etc. To mitigate these issues, in this work, we propose a time-mode MAC accelerator based on ferroelectric-FinFETs with CMOS-compatible doped- HfO2{\text{HfO}_\text{2}} in the gate-stack. Our rigorous analysis reveals a trade-off between the performance metrics such as computational precision, area-and energy-efficiency of the proposed MAC accelerator. Therefore, we provide the necessary design guidelines to further optimize the performance. Extensive design space exploration and simulations exploiting an experimentally calibrated compact model for the doped HfO2{\text{HfO}_\text{2}} ferroelectric capacitor along with experimentally calibrated baseline FinFET model for 14 nm-technology indicates that the proposed MAC accelerator exhibits an energy-efficiency of 800 TeraOperations/Joule, a considerably high area-efficiency of 12.607 bits/ μm2\mu\text{m}^{\text{2}} (including I/O peripheral circuitry), and a throughput of 2.5 TeraOp/s while supporting a 4-bit MAC operation for a square weight matrix of size 200 ×\times 200 which is sufficient for realistic inference tasks.
Article
Conventional computing architectures based on the von Neumann structure are suffering from the severe ‘memory wall’ issue due to the isolation and speed mismatch between memory and processor. As a promising solution, the concept of logic in-memory (LiM) has been proposed to effectively reduce the overhead of data migration and has been extensively studied in various memory technologies such as SRAM, DRAM, MRAM, ReRAM, etc. Among them, SOT-MRAM combines the advantages of non-volatility, low static power consumption, ultra-fast read/write speed, and high density, has emerged as one of the most promising candidates for low-power LiM implementations. In this paper, four in-memory logic operations, AND, OR, MAJ and full-addition (FA), are proposed based on the Unipolar Switching (US) SOT-MRAM devices. Incorporating the emerging switching behavior of SOT-MRAM, these operations can be performed with the basic memory access operations (read/write) with negligible modifying peripheral circuits. Meanwhile, by optimizing the operation steps, the performance degradation caused by the instability of SOT-MRAM device can be minimized in the proposed LiM architecture. Detailed simulation results show that the proposed design can reduce the latency (energy) of AND, OR operations at least by 71.2%, 74.4% (30.0%, 35.4%) compared with the existing SRAM and STT-MRAM designs. For MAJ and FA operations, the performance is improved by at least 34.7% and 44.8% compared to the existing design. The robustness of our design is demonstrated by the 100% pass of the 1000 samples Monte Carlo simulations for the sufficient switching current margin and the effectiveness of basic operations.
Article
The discovery of ferroelectric doped HfO2 enabled the emergence of scalable and CMOS-compatible ferroelectric field-effect transistor (FeFET) technology which has the potential to meet the growing need for fast, low-power, low-cost, and high-density nonvolatile memory, and neuromorphic devices. Although HfO2 FeFETs have been widely studied in the past few years, their fundamental switching speed is yet to be explored. Importantly, the shortest polarization time demonstrated to date in HfO2-based FeFET was ∼10 ns. Here, we report that a single subnanosecond pulse can fully switch HfO2-based FeFET. We also study the polarization switching kinetics across 11 orders of magnitude in time (300 ps to 8 s) and find a remarkably steep time-voltage relation, which is captured by the classical nucleation theory across this wide range of pulse widths. These results demonstrate the high-speed capabilities of FeFETs and help better understand their fundamental polarization switching speed limits and switching kinetics.
Article
A major roadblock for the integration of ferroelectric-field-effect transistors (FEFETs) at advanced technology nodes for embedded memory applications is their high, logic-incompatible write voltages. Herein, we explore Ge as a channel material to reduce write voltage of FEFET and report the first demonstration of p-type Ge-pFEFETs with record low write voltages of ±1.4 V with a memory window (MW) of 0.6 V at DC and write voltages of ±1.4 V, ±1.8 V and ±2.4 V for MW of 0.2 V, 0.5 V and 0.8 V for a write time of 10 μs, respectively. The write voltages observed in Ge-pFEFETs are ~ 50% lower than that of a Si-pFEFETs when compared against iso-memory window condition [±2.5 V with a MW of 0.6 V at DC, ±3.5 V for MW of 0.5 V for a write time of 10 μs]. Such dramatic reduction of write voltages in Ge-pFEFETs is achieved due to the fact that the native oxide of Ge (GeO x ), formed at the Ge interface, has a larger dielectric constant and lower thickness than those for SiO 2 on the Si platform. In addition, the lower bandgap and higher dielectric constant of Ge may lead to a lower surface potential for a given semiconductor charge, leading to further reduction in the write voltage. Further, our Ge-pFEFETs show write endurance of 10 7 cycles (the best-in-class for Ge-pFEFETs, as reported in the literature), excellent data retention, and immediate read-after-write capability. Our results indicate the attractiveness of Ge platform for FEFETs for embedded memory applications.
Article
In this study, scaled ferroelectric fin field-effect transistors (Fe-FinFETs) based on HfZrO 2 were fabricated and characterized for multi-level cell (MLC) operations. With the scaled dimensions of 40 nm for the fin width and gate length of 200–320 nm, the fabricated Fe-FinFET exhibited a large memory window measuring 2.8 V, which is more fault-tolerant for MLC operations. Further, the Fe-FinFET depicted a high switching speed of 100 ns and clearly separated intermediate states, which are suitable for MLC operations. Robust endurance of up to 10 5 fatigue cycles for each state and a data retention time of up to 10 4 s without degradation were recorded. The Fe-FinFET demonstrates a high potential for high-density nonvolatile memory applications.