Available via license: CC BY 4.0
Content may be subject to copyright.
A low-latency and energy-efficient 4-bit absolute value
detector for brain-machine interface applications
Yiwei Zhao
Zhejiang University-University of Illinois Urbana-Champaign Institute, Zhejiang
University, Hangzhou, 310058, China
yiwei8@illinois.edu
Abstract. This research article aims to develop a 4-bit absolute value detector, balancing speed
and power efficiency, with potential applications in Brain-Machine Interface (BMI) systems.
The detector outputs a binary signal, indicating whether the absolute value of the input
surpasses a predefined threshold. The design integrates two primary modules: an absolute
value calculator and a comparator. Initially, the study focuses on enhancing the architecture of
a multiplexer-based adder for absolute value calculation and selecting an efficient comparator
structure, emphasizing least significant bit comparison. Further, the implementation of logic
gates using Complementary Metal-Oxide-Semiconductor (CMOS) technology is elaborated.
The research concludes by assessing the minimum delay achievable in the critical path,
quantified at 74.22 units, and investigating strategies to minimize energy consumption. This is
achieved by adjusting gate dimensions and supply voltage, aiming for a delay 1.5 times the
minimum. The energy expenditure of the critical path is extrapolated to estimate the overall
circuit consumption. The findings demonstrate that, at 1.5 times the minimal delay, the circuit
achieves a maximum energy savings of 62.8% with a supply voltage of 0.815V.
Keywords: 4-bit absolute value detector, Delay optimization, Energy consumption, BMI,
CMOS technology
1. Introduction
Brain-Machine Interface (BMI) stands as a technology with significant potential, which offers hope
for restoring physical mobility in individuals afflicted with severe motor impairments resulting from
brain injuries, neurological disorders, and limb amputation [1]. One challenge facing BMI technology
is the delicate balance between calculation speed and energy consumption [2]. To enhance the
efficiency of bioelectrical signal interpretation and facilitate quicker interactions with external devices,
a high processing speed is required, leading to increased electrical energy consumption. However, due
to the need for long-term and portable operation of implanted brain terminals, careful control of
energy usage is imperative. Implanted brain terminals cannot consume much energy, as it would be
powered by a battery, or even bioelectricity [3]. The basic structure of a computational circuit includes
an abs-value detector. Therefore, researching the balance between the calculation speed and energy
consumption of the abs-value detector is crucial for BMI technology.
This research focuses on a low-delay design of a 4-bit abs-value detector and its optimization. First,
the paper introduces the design of the 4-bit abs-value detector in chapter 2. This part includes the
Proceedings of Urban Intelligence: Machine Learning in Smart City Solutions - CONFSEML 2024
DOI: 10.54254/2755-2721/65/20240465
© 2024 The Authors. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0
(https://creativecommons.org/licenses/by/4.0/).
26
choice with minimal delay between its different possible structures, and optimization of chosen
structure. It also shows how to implement the structure with CMOS. In chapter 3, the paper calculates
the minimal practical delay and corresponding energy of the structure by gate sizing and adjusting
of the circuit. And estimates the minimal possible energy for the circuit at 1.5 times of minimal delay
by 3 ways: gate sizing, adjusting and both. It aims to provide assistance in further improving the
processing speed and battery life of current BMI brain-implanted terminals, which would be helpful
for the patients suffering from brain injuries.
2. Structure of the abs-value detector and optimization
2.1. Basic logic of abs-value detector
The 4-bit abs-value detector consists of 2 parts: abs-value calculator and comparator [4]. The input x[n]
is passed through abs-value calculator, and its absolute value would be input into comparator,
compared with the threshold value. If the abs-value of the input exceeds a threshold, then the output
would be logical “1”, and otherwise the output would be “0”. Process shown in figure 1.
Figure 1. Overall structure of the 4-bit abs-value detector [4]
In this research, the threshold chosen for energy calculation is 011(3), but the comparator can
compare the input value with other 3-bit input thresholds as well.
The supplied voltage used for the circuit in this project is limited to V. The whole
circuit is implemented by CMOS logic. The input is assumed to be a 4-bit, 2’s complement format
number which comes from a chain of unit inverter. The output bit is connected to a loading
capacitance , which is 32 times of the capacitance if a unit-sized inverter. The unit-sized inverter
used for this research has parameter as follows:
.
2.2. Structure of abs-value calculator and optimization
The abs-value calculator can be implemented MUX-based adder [5]. Shown in figure 2: Bit A3
determines the sign of the input. If A3 is 0, the input is equal to or larger than 0, then other bits would
pass the MUX directly. If A3 is 1, the input is negative, then the circuit would flip other bits and pass
them to an adder. For any binary number, its opposite number would be its bit flipped number + 1.
Then the result of adder would be passed through MUX.
Proceedings of Urban Intelligence: Machine Learning in Smart City Solutions - CONFSEML 2024
DOI: 10.54254/2755-2721/65/20240465
27
Figure 2. Overall logic of MUX-based adder abs-value calculator (Photo/Picture credit: Original)
To minimize the delay and power utilization of the circuit, the structure of the circuit should be as
simple as possible. The calculation process of the adder is illustrated in figure 3.
Figure 3. Overall internal configuration of a 3-bit adder (Photo/Picture credit: Original)
To calculate the opposite number, input B2, B1, C-1 is always 0, and B0 is always 1. So, the
structure can be simplified to figure 4.
Figure 4. Simplified structure of a 3-bit adder (Photo/Picture credit: Original)
Proceedings of Urban Intelligence: Machine Learning in Smart City Solutions - CONFSEML 2024
DOI: 10.54254/2755-2721/65/20240465
28
Instead of 3 full adders, the simplified structure can be implemented with 3 half-adders. For each
half-adder, the truth table is as follows in table 1.
Table 1. Half-adder truth table
0
0
0
0
0
1
0
1
1
0
0
1
1
1
1
0
Based on that, the logic equation for each bit shown in the figure are =
, =
AND , =
, =
XOR , =
XOR . The overall circuit for the adder is illustrated in figure 5.
Figure 5. Detailed configuration of a 3-bit adder (Photo/Picture credit: Original)
2.3. Structure of comparator and optimization
The comparator for the circuit is chosen between two types of 3-bit comparators: Starting from LSB
(Least Significant Bit) to MSB (Most Significant Bit) (called type I in following passages) and starting
from MSB to LSB (called type II). The structures are shown in figure 6.
Proceedings of Urban Intelligence: Machine Learning in Smart City Solutions - CONFSEML 2024
DOI: 10.54254/2755-2721/65/20240465
29
Figure 6. Two types of comparators, with Type I displayed on the left and Type II on the right
(Photo/Picture credit: Original)
The type I comparator is chosen for the circuit, and there are two reasons: 1. The type I comparator
consists of less gates, and thus would save energy. 2. The order of previous signal outputting from abs-
value calculator is from LSB to MSB, so LSB can be processed by the comparator first, and decrease
overall delay. Finally, after linking different parts of the abs-value detector, the overall circuit diagram
is illustrated below in figure 7.
Figure 7. Overall circuit diagram (Photo/Picture credit: Original)
Proceedings of Urban Intelligence: Machine Learning in Smart City Solutions - CONFSEML 2024
DOI: 10.54254/2755-2721/65/20240465
30
2.4. Implementation of gates and MUX with CMOS
Compared to international technology roadmap of semiconductors (ITRS), CMOS technology is
superior in terms of both delay and energy consumption [6]. Five types of gates are needs to
implement the designed circuit: inverter, 2-NAND, 2-NOR, 2-XOR and 2-MUX. The unit-sized
inverter used for this reach has parameter as follows: , ,
. Thus, Wp and Wn is roughly 1.5: 1.
To determine the circuit’s delay, the g-factor h-factor and p-factor are needed for each gate
implemented with CMOS.
(1)
(2)
For each gate, the structure implemented by CMOS is shown below in figure 8.
Figure 8. CMOS implementation of gates (Photo/Picture credit: Original)
Based on equation (1) and (2), the g and h parameters are as follows (assume = ) in table
2 [7, 8].
Table 2. g, h factors of gates
Gate
g
h
NOT (unit inverter)
1
1
NAND
= 1.4
= 2
NOR
= 1.6
= 2
XOR
= 4
= 4
MUX
=
= 4
Proceedings of Urban Intelligence: Machine Learning in Smart City Solutions - CONFSEML 2024
DOI: 10.54254/2755-2721/65/20240465
31
3. Calculation of critical path of minimal energy of 1.5minimal delay.
The next step is to calculate the possible minimal energy of the circuit when it has a delay of
1.5minimal delay.
3.1. Assumptions for the circuit calculations
To simplify the calculation, here are two assumptions for the calculation process. (a) In this study, it is
assumed that all gates on the same stage have the same size as the corresponding gate on the critical
path, so that none of the non-critical path would have a longer delay than the critical path. (b) Based
on assumption (a), the total energy of the whole circuit is assumed to be proportional to the critical
path’s energy.
Assumption (b) is grounded in the following rationale:
The formular for energy calculation is:
(3)
is possibility of an input to change from 0 to 1, = [9]. Thus, based on assumption (a),
the ratio of total energy consumption and energy consumption of the gate on the critical path for each
stage is:
(4)
The and is settled for the circuit. So, it is reasonable to take the average of to get the
ratio of total power usage of the entire detector and the critical path’s power usage.
3.2. Calculation of minimal delay and corresponding critical path’s energy
3.2.1. Minimal delay of critical path. The method to estimate the delay of the entire detector is to
calculate the critical path’s delay [10]. The critical path of the circuit designed is illustrated in figure 9.
The inverter at the beginning is a size-1 inverter since the input signal is from a chain of size-1
inverters. The last capacitor is which is 32 unit-sized inverters.
Figure 9. Critical path topology (Photo/Picture credit: Original)
The formular for the delay is:
(5)
(6)
Proceedings of Urban Intelligence: Machine Learning in Smart City Solutions - CONFSEML 2024
DOI: 10.54254/2755-2721/65/20240465
32
(7)
For V, the
term is monotonic increasing. So, for minimal delay, choose =
1V.
For the whole critical path, = 1 (unit-sized inverter) and is 32 (CL). Thus, .
There are two branches on the critical path, and based on assumption (a), = . And
. To get the minimal D, introduce a new parameter:
(8)
For minimal D, use mean value inequality, corresponding
(9)
Here, N represents the quantity of gates along the path.
With equation (1) (2) (5) ~ (9), for the critical path, the corresponding minimal delay and gate size
is shown in table 3.
Table 3. Sizes of gates when = 2.125
C1
C2
C3
C4
C5
C6
Gate
size
1
0.5795
0.9122
0.4846
0.3862
0.5861
C7
C8
C9
C10
C11
C12
CL
Gate
size
1.2456
1.6543
3.5153
5.3357
11.3384
15.0588
32
And the possible minimal delay for this circuit is: . The 1.5 times of minimal delay
is:
3.2.2. Energy of critical path under minimal delay. According to equation (3), is needed to
calculate the energy of critical path. is possibility of an input to change from 0 to 1 in one clock
cycle, = [9].
For the circuit designed in this research, the input is considered a random 4-bit 2’s complement
number, and threshold is 011. Based on that, values of are as follows in table 4.
Table 4. factor for each load when threshold is 011
0.25
0.25
0.25
0.25
0
0
0.1875
0.1875
0.2344
0.2344
0.2148
0.2148
0.2148
And energy of critical path under minimal delay
3.3. Minimizing energy consumption at 1.5minimal delay by resizing gates and adjusting \
3.3.1. Predictions about relationship between and In equation (5), the terms of and gate
size are separate. Thus, it is reasonable to consider only and find the possible range of at first.
To achieve 1.5minimal delay by only adjusting , it should satisfy the equation:
Proceedings of Urban Intelligence: Machine Learning in Smart City Solutions - CONFSEML 2024
DOI: 10.54254/2755-2721/65/20240465
33
(10)
And corresponding is 0.775 V, and corresponding minimal energy is
. Under this configuration, the delay by sizing the gate has reached minimum, so no
lower than 0.775 V can reach a delay equal or lower than 1.5minimal delay. So, the possible range of
is . At the boundary of the range, which is 0.775 and 1, the minimal energies are
achieved by adjusting only and gate size respectively, so the would be larger than
intermediate part. The prediction of the trend of the curve would be downward at first, and
then upward.
3.3.2. Solving curve at 1.5minimal delay. Since the range of is determined, it is
possible to choose value in the range uniformly and calculate the for the chosen respectively.
The constraint equation is:
(11)
There is a small range for the delay in case no solution due to finite step.
The solved result is in table 5 and corresponding graph is illustrated in figure 10.
Table 5. Relationship between and
/ V
/ Unit
0.775
9.555
0.795
6.312
0.815
5.916
0.835
5.926
0.855
6.018
0.875
6.188
0.895
6.369
0.915
6.595
0.935
6.829
0.955
7.070
0.975
7.323
0.995
7.627
Proceedings of Urban Intelligence: Machine Learning in Smart City Solutions - CONFSEML 2024
DOI: 10.54254/2755-2721/65/20240465
34
Figure 10. curve (Photo/Picture credit: Original)
The graph shows that the trend fits the previous prediction: the trend of the curve would
be downward at first, and then upward. To get an overall minimal energy at 1.5minimal delay, the
best is 0.815 V. The comparation between: 1. Adjust only; 2. Adjust gate-size only; 3. Adjust
both variables at the same time, is as follows in table 6.
Table 6. Methods and corresponding minimal energy
Method
/ Unit
Percentage of saved energy
Adjust only
9.555
39.9%
Adjust gate-size only
7.627
52.1%
Adjust both variables
5.916
62.8%
4. Conclusion and prospect
The research designed a 4-bit abs-value detector with low delay and reduce its energy consumption by
sacrificing fifty percent of its processing speed. The MUX-based adder implementation of abs-value
calculator would decrease the delay significantly for positive inputs. And a comparator from LSB to
MSB would lower the general delay of the circuit. Implementing XOR and MUX with CMOS instead
of compound gates would also save energy and processing time. Among all 3 ways of lowering energy
consumption, adjusting both and gate-size has the greatest percentage of energy saving, which is
62.8%. For specific BMI circuit whose priority for speed requirements is lower than the energy
efficiency requirements, the method of adjusting both and gate-size in this research would be
significantly helpful.
The research has the following deficiencies: 1. The assumptions made in chapter 3.1 is not accurate
enough. For example, in critical path, the MUX on the branch is actually the same MUX that in on the
critical path. So, the assumption that all gates on the same stage have the same size as the
corresponding gate on the critical path would be inaccurate. Meanwhile, it is possible to achieve a
lower energy for the whole circuit by adjusting gate-size on non-critical path other than the same size
as corresponding gates on critical path. 2. The algorithm used to calculate the minimal consumption
for each is nearly brute force and has a very high complexity. Therefore, the range and step of
each gate is limited on the critical path. And a smaller minimal energy with higher accuracy is possible.
The future study may improve the deficiencies mentioned for this research. It is meaningful to
build a more precise relationship between energy consumption and each gate-size, instead of only
focusing on critical path. And it is more efficient to optimize the algorithm with methods like binary
search. This would make smaller step and wider range for each gate-size, and lead to a lower possible
energy consumption. Besides, all the gates are implemented with CMOS style, but the gates are also
Proceedings of Urban Intelligence: Machine Learning in Smart City Solutions - CONFSEML 2024
DOI: 10.54254/2755-2721/65/20240465
35
possible to be achieved with Pass Transistor Logic (PTL) style. So further research can focus on the
delay and energy consumption differences between CMOS and PTL.
References
[1] Musk E. An Integrated Brain-Machine Interface Platform With Thousands of Channels. J Med
Internet Res. 2019;21(10):e16194.
[2] Wu N, Wan S, Su S, Huang H, Dou G, Sun L. Electrode materials for brain–machine interface:
A review. InfoMat. 2021; 3(11): 1174-1194.
[3] Wang PT, et al. A benchtop system to assess the feasibility of a fully independent and
implantable brain-machine interface. J Neural Eng. 2019;16(6):066043.
[4] Yu A, Zhong Y, Zhou X. Design and Optimization of 4-bit Absolute-value Comparator Using
CMOS & PTL Technique. In: 2021 International Conference on Electronic Information
Engineering and Computer Science (EIECS); Changchun, China; 2021. p. 912-916.
[5] Yuan M. An Absolute-value Detector with Threshold Comparing for Spike Detection in Brain-
machine Interface. J Phys Conf Ser. 2021;2113:012038.
[6] Radamson HH, et al. State of the Art and Future Perspectives in Advanced CMOS Technology.
Nanomaterials. 2020;10:1555.
[7] Huang Z. Performance Optimization of 4-bit Absolute Value Detector Based on Structural
Design. J Phys Conf Ser. 2023;2435:012010.
[8] Chen J, Chen M. Design of a 4-bit absolute value detector with balanced energy and delay. AIP
Conf Proc. 2023;3017(1):040001.
[9] Chandrakasan AP, Brodersen RW. Minimizing power consumption in digital CMOS circuits.
Proceedings of the IEEE. 1995;83(4):498-523.
[10] Atin S, Lubis R. Implementation of Critical Path Method in Project Planning and Scheduling.
IOP Conf Ser: Mater Sci Eng. 2019;662(2):022031.
Proceedings of Urban Intelligence: Machine Learning in Smart City Solutions - CONFSEML 2024
DOI: 10.54254/2755-2721/65/20240465
36