ArticlePDF Available

Improving the Spatial Characteristics of Three-Level LUT-Based Mealy FSM Circuits

Authors:
  • The Jacob of Paradies University

Abstract and Figures

The main purpose of the method proposed in this article is to reduce the number of look-up-table (LUT) elements in logic circuits of sequential devices. The devices are represented by models of Mealy finite state machines (FSMs). Thesee are so-called MPY FSMs based on two methods of structural decomposition (the replacement of inputs and encoding of output collections). The main idea is to use two types of state codes for implementing systems of partial Boolean functions. Some functions are based on maximum binary codes; other functions depend on extended state codes. The reduction in LUT counts is based on using the method of twofold state assignment. The proposed method makes it possible to obtain FPGA-based FSM circuits with four logic levels. Only one LUT is required to implement the circuit corresponding to any partial function. An example of FSM synthesis using the proposed method is shown. The results of the conducted experiments show that the proposed approach produces LUT-based FSM circuits with better area-temporal characteristics than for circuits produced using such methods as Auto and One-hot of Vivado, JEDI, and MPY FSMs. Compared to MPY FSMs, the values of LUT counts are improved. On average, this improvement is 8.98%, but the gain reaches 13.65% for fairly complex FSMs. The maximum operating frequency is slightly improved as compared with the circuits of MPY FSMs (up to 0.64%). For both LUT counts and frequency, the gain increases together with the growth for the numbers of FSM inputs, outputs and states.
Content may be subject to copyright.
Citation: Barkalov, A.; Titarenko, L.;
Mazurkiewicz, M.; Krzywicki, K.
Improving the Spatial Characteristics
of Three-Level LUT-Based Mealy
FSM Circuits. Electronics 2023,12,
1133. https://doi.org/10.3390/
electronics12051133
Academic Editors: Leonardo Pantoli,
Egidio Ragonese, Paris Kitsos,
Gaetano Palumbo and Costas
Psychalinos
Received: 5 February 2023
Revised: 23 February 2023
Accepted: 24 February 2023
Published: 26 February 2023
Copyright: © 2023 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
electronics
Article
Improving the Spatial Characteristics of Three-Level LUT-Based
Mealy FSM Circuits
Alexander Barkalov 1,2,†,* , Larysa Titarenko 1,3,† , Małgorzata Mazurkiewicz 4,†,*
and Kazimierz Krzywicki 5,†
1Institute of Metrology, Electronics and Computer Science, University of Zielona Góra, ul. Licealna 9,
65-417 Zielona Góra, Poland
2Department of Computer Science and Information Technology, Vasyl Stus’ Donetsk National University,
600-richya str. 21, 21021 Vinnytsia, Ukraine
3
Department of Infocommunication Engineering, Faculty of Infocommunications, Kharkiv National University
of Radio Electronics, Nauky Avenue 14, 61166 Kharkiv, Ukraine
4Institute of Control & Computation Engineering, University of Zielona Góra, ul. Licealna 9,
65-417 Zielona Góra, Poland
5Department of Technology, The Jacob of Paradies University, ul. Teatralna 25,
66-400 Gorzów Wielkopolski, Poland
*Correspondence: a.barkalov@imei.uz.zgora.pl (A.B.); m.mazurkiewicz@issi.uz.zgora.pl (M.M.)
These authors contributed equally to this work.
Abstract:
The main purpose of the method proposed in this article is to reduce the number of look-
up-table (LUT) elements in logic circuits of sequential devices. The devices are represented by models
of Mealy finite state machines (FSMs). Thesee are so-called MPY FSMs based on two methods of
structural decomposition (the replacement of inputs and encoding of output collections). The main
idea is to use two types of state codes for implementing systems of partial Boolean functions. Some
functions are based on maximum binary codes; other functions depend on extended state codes. The
reduction in LUT counts is based on using the method of twofold state assignment. The proposed
method makes it possible to obtain FPGA-based FSM circuits with four logic levels. Only one LUT
is required to implement the circuit corresponding to any partial function. An example of FSM
synthesis using the proposed method is shown. The results of the conducted experiments show that
the proposed approach produces LUT-based FSM circuits with better area-temporal characteristics
than for circuits produced using such methods as Auto and One-hot of Vivado, JEDI, and MPY FSMs.
Compared to MPY FSMs, the values of LUT counts are improved. On average, this improvement is
8.98%, but the gain reaches 13.65% for fairly complex FSMs. The maximum operating frequency is
slightly improved as compared with the circuits of MPY FSMs (up to 0.64%). For both LUT counts
and frequency, the gain increases together with the growth for the numbers of FSM inputs, outputs
and states.
Keywords:
Mealy FSM; FPGA; LUT; synthesis; replacement of inputs; collections of outputs; twofold
state assignment
1. Introduction
To represent various sequential blocks, a model of a Mealy finite state machine
(FSM) [1]
can be applied. There are many examples of using this model in the implementa-
tion of various digital systems [
2
]. In this paper, we consider FSM circuits implemented
using field-programmable gate arrays (FPGAs) [
3
,
4
]. This choice is due to the wide use
of FPGAs in the implementation of a wide variety of projects [
4
,
5
]. Leading experts are
confident that FPGAs will continue to dominate logic design for at least the next twenty
years [6].
When using any logic basis for the implementation of FSM circuits, a number of
optimization problems always arise [
7
,
8
]. One of the most important tasks is to obtain
Electronics 2023,12, 1133. https://doi.org/10.3390/electronics12051133 https://www.mdpi.com/journal/electronics
Electronics 2023,12, 1133 2 of 31
a circuit that is optimal in terms of hardware costs. By optimal, we mean a circuit that
consumes the minimum possible amount of chip resources while simultaneously providing
the required level of performance and power consumption. In the case of FPGA-based cir-
cuits [
9
], the optimization strategy significantly depends on the types of configurable logic
blocks (CLBs) used [
10
]. In this paper, we discuss the most common CLBs which include
look-up table (LUT) elements, programmable flip-flops, and dedicated
multiplexers [10,11]
.
To combine these CLBs into an FSM circuit, the following chip resources are used: the
synchronization tree, programmable interconnections, and programmable input-outputs
[
12
,
13
]. The method proposed in this paper is aimed at reducing the number of LUTs (LUT
count) in a resulting FSM circuit.
It is generally accepted that reducing LUT count leads to improving the spatial char-
acteristics of FSM circuits (reducing the occupied chip areas) [
14
,
15
]. Area reduction can
be achieved by applying structural decomposition (SD) methods [
9
] leading to multi-
level FSM circuits. However, such a reduction may have an overhead [
9
]. This overhead
consists of a significant performance degradation compared to equivalent single-level
FSM
circuits [14,16]
. However, performance has to be sacrificed if the criterion of design
optimality is the minimum occupied chip area.
The best LUT counts can be obtained for three-level FSM circuits when the methods
of replacing FSM inputs and encoding collections of FSM outputs [
17
] are used together.
However, for sufficiently complex FSMs, some of the logic blocks (or even
all three blocks
)
may have a multilevel structure. This leads to an increase in the number of logical levels and
interconnections. In turn, this leads to an increase in the occupied area, power consumption
and delay time of the FSM circuit. In this paper, we propose a method to reduce the LUT
counts of three-level FSM circuits. The proposed method is based on using twofold state
assignment [
18
]. This approach leads to a decrease in the number of LUTs and their levels
in the resulting LUT-based FSM circuits.
There are some leading companies producing FPGA chips. The largest producer is
AMD Xilinx [
19
]. As follows from [
4
], FPGAs from AMD Xilinx are widely used in various
projects. Due to this, we structured our approach according to the FPGA families [
19
] by
AMD Xilinx. In our research, we use FPGAs from the VIrtex-7 family [10].
The article contains several new scientific results. Firstly, a new architecture of an
LUT-based Mealy FSM circuit is proposed. Secondly, methods for the uniform distribution
of inputs and state encoding are proposed, which make it possible to reduce the number of
LUTs in the circuit of the input replacement block in comparison with the known methods
for implementing this block. Thirdly, a new method for stabilizing FSM outputs is proposed,
in which the input register is replaced by a register of output collection codes. The noted
new approaches led to the main contribution of the article, which is a novel design method
aimed at hardware reduction in the multilevel circuits of LUT-based Mealy FSMs. The
hardware reduction is achieved due to the use of two types of state codes. The maximum
binary state codes are used to replace the FSM inputs. Other partial Boolean functions
depend on extended state codes. The proposed approach leads to four-level FSM circuits
where any partial function is represented by a single LUT. The conducted experiments
show that the resulting FSM circuits include fewer LUTs compared to equivalent three-level
circuits [
17
]. It is very important that the hardware reduction does not lead to the significant
deterioration of temporal characteristics.
The rest of the paper is organized as follows. Section 2shows the peculiarities of the
LUT-based Mealy FSM design. The analysis of related works is discussed in Section 3.
Section 4presents the main idea of our method. In Section 5, we include a step-by-step
example showing how to apply the proposed method. Section 6includes the experimental
results. The last part of the article is a short conclusion.
2. Peculiarities of LUT-Based Mealy FSM Design
The law of the behaviour of a Mealy FSM can be represented using three sets and two
functions [
20
]. These sets are the following: a set of internal states
S={s1
,
. . .
,
sM}
, a set of
Electronics 2023,12, 1133 3 of 31
inputs
X={x1
,
. . .
,
xL}
, and a set of outputs
Y={y1
,
. . .
,
yN}
. The interstate transitions
are represented by a function of transitions. An output function shows the FSM outputs
generated during these transitions. In this article, we use a state transition graph (STG) [
1
]
as an initial tool for FSM design. An STG consists of vertices representing FSM states. The
vertices are connected by arcs corresponding to interstate transitions. Each arc is marked by
an input signal (the conjunction of inputs leading to a particular transition) and a collection
of outputs associated with this transition [
1
]. To synthesize the FSM circuit, we transformed
this STG into the equivalent state transition table (STT) [1].
To design an FSM circuit, it is necessary to replace abstract states
smS
with binary
codes
K(sm)
. This is the state-assignment step [
1
]. To minimize the number of state variables
and input memory functions (IMFs), it is necessary to minimize the bitness of state codes.
The minimum possible number
RMB
of state-code bits corresponds to a maximum state
assignment [20]. This number is determined as
RMB =dlog2Me. (1)
To encode states, state variables creating a set
T={T1
,
. . .
,
TRMB }
are used. To keep
the state codes, a special register, RG, consisting of
RMB
flip-flops is used as a part of
FSM circuit.
In most practical cases [
9
], as elements of the state register are used the synchronous
D flip-flops. Each state variable is represented by a unique flip-flop. The input of the
r
-th
flip-flop is connected with an IMF
Dr
D where
D={D1
,
. . .
,
DRMB}
is a set of IMFs. The
initial state code is forcibly loaded into RG. To do this, a special pulse of initialization Start
is used. Set D determines a state code loaded into RG. To load a code
K(sm)
, the pulse of
synchronization Clock is used.
Using either STG or STT, a direct structure table (DST) [
20
] can be constructed. There
are six columns in the DST [
20
]:
sC
,
sT
,
Xh
,
Yh
,
Dh
,
h
. The data from these columns have
the following meaning:
sC
is an initial state for a given transition;
sT
is a final state for this
transition;
Xh
is a conjunction of FSM inputs determining the transition
hsC
,
sTi
;
Yh
is a
collection of outputs (CO) produced during the transition
hsC
,
sTi
;
Dh
is a set of IMFs equal
to 1 to execute the
h
-th transition (to load the code
K(sT)
into RG); and
h
is the transition
number
(h {
1,
. . .
,
H})
. The DST is a base for constructing the following systems of
Boolean functions (SBFs) [21]:
D=D(T,X); (2)
Y=Y(T,X). (3)
The SBFs (2) and (3) are a base for implementing the so-called P Mealy FSM [
9
]. In
FPGA-based FSMs, the flip-flops of RG are distributed among the CLBs, including LUTs,
generating the functions (2). Thus, the distributed state-code register is hidden. As a result,
there are only two blocks in the structural diagram of LUT-based P Mealy FSM (Figure 1).
X
Start
Clock LT LY
YT
Figure 1. Structural diagram of LUT-based P Mealy FSM.
The LUTs of a block LT implement IMFs (2). The memory elements of LT create the
RG. This explains why the pulses Start and Clock enter LT. Obviously, the state variables
TrT
come out of the block LT. The block LY generates functions (3) representing the
outputs ynY. Each LUT has SLinputs.
Electronics 2023,12, 1133 4 of 31
The functions (2) and (3) are represented by their sum of products (SOPs) [
1
]. An
SOP of a Boolean function
fiDY
has
NI(fi)
literals. For rather complex FSMs, the
following condition may hold:
NI(fi)>SL. (4)
If (4) takes place, then the circuit of P Mealy FSM is multi-level. It is known [
9
] that
multi-level circuits are less efficient than the equivalent single-level circuits (the former are
much slower and require more power than the latter). The same is true for the numbers
of interconnections in the equivalent single-level and multi-level circuits. The growth in
interconnections leads to the further growth in the values of both time of cycle and power
consumption. The use of SD-based methods can lead to a significant improvement in the
overall circuit quality [9,17].
There are two types of literals in SOPs of functions (2) and (3): external inputs
xlX
and elements of the set
T
(the variables
TrT
). Each function
fiDY
depends
on
RiRMB
state variables and
LiL
inputs. There is only one LUT in the circuit
corresponding to the function fiDY, if the following condition is true:
Ri+LiSL. (5)
If condition (5) holds, then the values of function
fiDY
are generated by a single-LUT
circuit. If condition (5) takes place for all
R+N
functions, then the circuit of P Mealy FSM
is single-level. A single-level circuit has the best possible values of the required chip area,
power consumption and maximum operating frequency.
However, there are FSMs with around 500 states and 30 inputs [
2
]. In this case,
each function fiDY
may depend on up to 39 arguments. Thus, their SOPs can include
up to 39 literals. Of course, these SOPs cannot be implemented using only a single LUT
with
SL=
6 inputs. Thus, the corresponding circuits will be multi-level with spaghetti-type
interconnecting systems. To improve the characteristics of multi-level circuits, various
optimization methods should be applied. In this paper, we propose an approach which
allows reducing the chip area occupied by the LUT-based FSM circuit when the condition
(5) is violated.
3. Brief Analysis of Related Works
The problem of area reduction is discussed in thousands of monographs and articles.
For example, various methods for solving this problem are proposed in the following
works (to name but a few): [
14
,
22
28
]. As follows from [
23
], reducing the required chip
area is connected with reducing the LUT count for a corresponding circuit. To achieve
this goal, three groups of methods can be used: a proper state assignment, a functional
decomposition (FD) of Boolean functions, and SD-based approaches [9].
The proper state assignment leads to the elimination of some literals from
SOPs (2) and (3) [20]
.
If the elimination of literals results in the fulfilment of condition (5) for SOPs of all functions (2) and
(3), then the resulting FSM circuit is single-level. This can be achieved using, for example, the state
assignment method JEDI distributed with the CAD system SIS [
29
]. JEDI-based optimization is
achieved by creating adjacent codes for states whose transitions depend on the same FSM inputs
xlX
. As shown in [
30
], this allows elimination of up to 3 literals from SOPs representing
benchmark FSMs from the library LGSynth93 [
31
]. Thus, JEDI can solve the optimization problem
if the relation
NI(fi)SL
3 holds. However, this relation only takes place for rather simple
FSMs [9].
As follows from various research [
32
35
], there is no best universal state-assignment
approach. For example, optimization success depends on how many variables
xlX
the transitions from each state depend on. For different FSMs, the same state-assignment
method may either improve or deteriorate the quality of resulting circuits. In addition,
the optimization strategy depends strongly on the peculiarities of the logic elements used
[
33
]. If LUTs are used, the spatial improvement can be achieved due to an increase in
the state-code
length [36]
. In the extreme case, the number of bits is equal to
M
. This is a
Electronics 2023,12, 1133 5 of 31
one-hot state assignment [
1
], when the RG includes
M
flip-flops. The results of research
reported in [
32
] show that the one-hot state assignment can improve the FSM characteristics,
if there is
M>
16. However, it is necessary to take into account the number of FSM inputs
[
34
]. As shown in [
32
], using MBC improves FSM quality if there is
L>
10 (compared to
FSMs with one-hot codes). This situation stimulates the development of new types of state
codes and encoding strategies.
If no state-assignment method allows the implementation of a single-level circuit for
a given FSM, then decomposition methods should be applied. In this case, the initial
functions (2) and (3)
are represented as a composition of partial Boolean functions (PBFs).
The decomposition is executed till the condition (4) is satisfied for each partial function. Any
kind of decomposition leads to a multi-level FSM circuit.
In the case of FD-based FSM circuits, CLBs are connected by complicated systems of
“spaghetti-type” interconnections [
11
]. Such circuits have much lower clock rates compared
to equivalent single-level solutions. This is connected with the fact that, now, “...wires
delay has come to dominate logic delay” [
37
]. In addition, compared to single-level circuits,
FD-based circuits are more power-consuming. This phenomenon is due to the fact that the
interconnections absorb up to 70% of the total power consumed by an FPGA-based FSM
circuit [
37
]. However, the advantage of FD is that it is applicable to the implementation
of Boolean functions of any practical complexity. Therefore, FD-based algorithms are
used in all industrial CAD systems aimed at the implementation of FPGA-based digital
systems [3841].
In many cases, the methods of structural decomposition [
9
] allow the production of
FSM circuits with better space-time-energy characteristics than their FD-based counterparts.
The SD-based FSM circuits can be viewed as a composition of large logic blocks with unique
input-output systems. Such an approach leads to the regularization of interconnections
compared to FD-based FSM circuits [
16
]. Different methods of SD can be used together.
Due to this, the number of blocks can vary from 2 to 4, depending on how many methods
are used. The methods of SD and FD can be used together [9].
Two methods of SD are most commonly used. One of them is the replacement of
inputs (RI) with some additional variables [
9
]. The second method is the encoding of
COs [9]. Below is a brief description of these methods.
The process of RI comes down to replacing inputs
xlX
with the additional variables
from a set
B={b1
,
. . .
,
bG}
. The replacement makes sense if
LG
[
9
]. As a result, the
SBFs (2) and (3) are replaced by the systems
B=B(T,X); (6)
D=D(T,B); (7)
Y=Y(T,B). (8)
The system (6) is represented by a block with inputs
xlX
and
TrT
. In the following
text, we denote this block with the symbol LB. Obviously, the circuit of LB consumes
some chip resources. The systems (7) and (8) are implemented by block LTY. This approach
makes sense if the SOPs (7) and (8) include much fewer literals than the
SOPs (2) and (3)
[
9
]. In this case, the LUT counts in the circuit of P FSM significantly exceed the total number
of LUTs necessary to implement SBFs (6)–(8).
During the interstate transitions, Q different COs
YqY
are generated. Each CO can
be represented by a code K(Yq). This code includes RCO bits [9]:
RCO =dlog2Qe. (9)
Electronics 2023,12, 1133 6 of 31
The COs are encoded using some additional variables creating a set
Z={z1
,
. . .
,
zRCO}
.
If this approach is applied together with the RI, then the SBF (3) is replaced with the
following SBFs:
Z=Z(T,B); (10)
Y=Y(Z). (11)
The system (10) depends on the same variables as the system (7). Thus, these two SBFs are
implemented using the same block, LTZ. To implement SBF (11), block LY is used. Sharing
these methods turns the original P FSM (Figure 1) into MPY FSM (Figure 2).
X
Start
Clock LB
B
Z
LTZ
LY
Y
T
Figure 2. Structural diagram of MPY Mealy FSM.
In MPY FSM, the block LB generates the additional variables (6). The block LTZ
generates IMFs represented by (7) and additional variables (10). The block LY generates
the FSM outputs (11). As shown in [
17
], the transition from P FSM to MPY FSM allows the
reduction in LUT counts in equivalent FSM circuits. Of course, this area reduction leads to
a decrease in the value of maximum operating frequency. This decrease can be viewed as
the area-reducing overhead.
To obtain SBF (6), a table of RI should be constructed [
20
]. Its columns are marked
by states
smS
, whereas additional variables
bgB
mark its rows. There is a symbol
xl
written at the intersection of a row
bgB
and column
smS
, if the variable
bgB
replaces the input
xlX
for the state
smS
. In fact, the block LB is a multiplexer, the
information inputs of which are connected to inputs
xlX
and the control inputs are
connected to state variables TrT.
To obtain SBFs (7) and (10), it is necessary to create a transformed DST. In the trans-
formed DST, the column
Xh
is replaced by a column
Bh
, whereas the column
Yh
is replaced
by a column
Zh
. These new columns are filled in as follows. For example, the first row
of DST includes a CO
Y2
generated during a transition
hs1
,
s2i
caused by the input signal
X1=x1x2
. Let the following relations take places for the state
s1S:x1=b1
and
x2=b2
.
In this case, the input signal
X1=x1x2
is replaced by the conjunction
B1=b1b2
written in
the column
Bh
. If
K(Y2) =
101, then the additional variables
z1
,
z3Z
are written in the
column Zh. All other rows of the transformed DST are filled in the same manner.
To obtain SBF (11), it is necessary to create the Karnaugh map whose cells are marked
by the variables
zrZ
. The symbols
Yq
are written inside the cells. Using this map, the
minimized SOPs (11) are constructed. The minimization makes sense if some literals are
eliminated from all product terms of a SOP representing a function ynY[9].
The application of this approach is most efficient if condition (4) is satisfied for all func-
tions
fiBDZY
[
9
]. Otherwise, there will be more than a single LUT in the circuits
for functions that do not satisfy condition (4). Moreover, this leads to the multi-levelness of
the corresponding blocks, which further reduces the MPY FSM performance. To implement
these multi-level circuits, the methods of FD should be applied.
Electronics 2023,12, 1133 7 of 31
To overcome this shortcoming of MPY FSM, we propose to transform its structural
diagram using the method of two-fold state assignment (TSA) [18]. This idea is discussed
in the next section.
4. Main Idea of Proposed Method
To execute the TSA, it is necessary to create a partition
πS={S1
,
. . .
,
SK}
of the set of
states. As a result, each state
smS
has two codes. The maximum binary code
K(sm)
has
RMB
bits. This code represents a state as some element of the set
S
. The partial code
C(sm)
represents a state as some element of a class
SkπS
. This class includes
Mk
elements. To
encode them, Rkbits are sufficient:
Rk=dlog2(Mk+1)e. (12)
In (12), the value of
Mk
is incremented to encode the relation
sm/Sk
. We use the code with
all zeroes to encode this relation. This code represents the state
smSk
for all classes other
than Sk.
The codes
C(sm)
for all classes
SkπS
form an extended state (ESC) code of the state
smSk. Each ESC includes RSbits, where
RS=R1+· · · +RK. (13)
To create ESCs, the additional variables are used. These variables are elements of a
set
V=V1V2. . . VK
. The variables
vrVk
create the codes
C(sm)
for the states
smSk
. To generate ESCs, it is necessary to transform state codes
K(sm)
into codes
C(sm)
for all states smS. To transform the codes, it is necessary to create the following SBF:
V=V(T). (14)
We discuss a case wherein both the replacement of inputs and encoding of COs are
executed. In this case, each class
SkπS
determines three sets. A set
BkB
includes
variables
bgB
determining transitions from the states
smSk
. A set of additional
variables
ZkZ
includes elements determining COs generated during transitions from
the states
smSk
. Finally, the elements of a set
DkD
include IMFs equal to 1 in the
codes of the states next to states
smSk
. Each class
SkπS
determines the following
systems of PBFs:
Dk=Dk(Vk,Bk); (15)
Zk=Zk(Vk,Bk). (16)
To obtain the final values of functions
DrD
and
zrZ
, it is necessary to create the
following SBFs:
D=D(D1, . . . , DK); (17)
Z=Z(Z1, . . . , ZK). (18)
The functions fiDZare disjunctions of corresponding PBFs.
The combined use of these three methods of SD leads to MP
T
Y Mealy FSMs. The
subscript “T” shows that the two-fold state assignment is used. Its structural diagram
consists of four logic levels (Figure 3).
In MP
T
Y Mealy FSM, the block LB generates functions (6) to replace FSM inputs using
additional variables. The second logic level consists of blocks LB1,
. . .
, LBK. Each block LBk
implements systems of PBFs (15) and (16). These functions are transformed into functions
fiDZ
by the block LTZ. This block represents the third logic level of FSM circuit. The
block LTZ includes two distributed registers. One of them is the state code register RG.
The RG outputs are used as a feedback for the input transformation. In addition, they
Electronics 2023,12, 1133 8 of 31
enter a block LV to create ESCs. The second register (a register RZ) keeps the codes of
COs. We discuss the necessity of RZ later. Both registers are zeroed by the pulse Start and
synchronized by the pulse Clock. The fourth logic level includes two blocks. The block LY
generates FSM outputs represented by (11). The block LV transforms the maximum state
codes K(sm)into extended state codes C(sm). This block implements SBF (14).
To reduce the chip area occupied by the LUT-based circuit of MP
T
Y Mealy FSM, we
propose two new approaches. One of them allows the reduction of the number of LUTs and
their levels in the circuit of LB. The second method aims to reduce the number of flip-flops
necessary for the stabilization of the FSM operation.
LB1 . . .
LTZ
Start
Clock
B1V1
LBK
BKVK
Z1D1ZKDK
B
LB
XT
LY
Z
Y
LV
T
V
V
Figure 3. Structural diagram of MPTY Mealy FSM.
We use the symbol
X(bg)
for a set of FSM inputs replaced by an additional variable
bgB
. As a rule, the RI is executed in the following way [
20
]: the number of FSM inputs in
different sets
X(bq)
should be maximal. At best, identical inputs
xlX
should be replaced
by the same variable
bgB
. Such an approach allows minimization of the chip area if
an FSM circuit is implemented using programmable logic arrays (PLAs) [
9
]. However,
PLAs have a lot of inputs, whereas this number is very limited for LUTs. Thus, we propose
distributing inputs
xlX
in a way which allows holding the following condition for the
maximum possible number of sets X(bg):
|X(bg)|+RMB =SL. (19)
Obviously, if (19) takes place for the set
X(bg)
, then a circuit generating the function
bgB
includes only one element. If (19) takes place for all sets
X(bg)
, then the block LB includes
Gelements. In addition, this circuit is single-level.
To increase the value of
|X(bg)|
, we propose to encode the states in a way that decreases
the number of state variables in functions (6). Let
S(bg)S
be a set of states whose
transitions depend on the inputs
xlX(bg)
. We propose to encode the states
smS(bg)
in
such a way that their codes create the minimum possible number of generalized cubes of
RMB
-dimensional Boolean space. This approach allows excluding some state variables as
literals of SOPs (6).
As a rule, FSMs are not stand-alone units. They are used as parts of a digital system.
Due to it, the stability of the outputs is one of the very important problems in FSM circuit
design [
13
,
42
,
43
]. If an FSM is a part of some digital system, then the FSM outputs are
inputs of other system’s blocks. It is known [
1
,
20
] that outputs of Mealy FSMs are unstable:
input fluctuations may lead to output fluctuations. In turn, these fluctuations of FSM
outputs may cause failure in some blocks of a digital system. It is possible to avoid such
Electronics 2023,12, 1133 9 of 31
failures by stabilizing the FSM inputs. To do this, it is necessary to introduce a synchronous
register of inputs (RI) [20]. This changes the FSM operation mode.
De facto, the set of inputs
X={x1
,
. . .
,
xL}
consists of outputs of various system blocks.
These outputs enter the flip-flops of RI. Till these outputs are transients, the synchronization
signal of RI is not active. Due to this, the FSM is disconnected from
other blocks
. Thus, the
RI keeps the values of FSM inputs registered in the previous cycle. After the stabilization
of system outputs, they are loaded into the RI using the required edge of synchronization.
Thus, eliminating the dependence of the inputs’ stability on the stability of system outputs
leads to additional area costs and reduces overall performance. This is an overhead of
stability (additional LUTs, flip-flops, interconnections, power consumption and delay).
Thus, it makes sense to reduce this overhead.
In our paper, we propose to include a register RZ into block LTZ. There is a flip-flop
in each CLB generating a function
zrZ
. Thus, to organize the RZ, there is no need for
additional LUTs. In addition, these flip-flops could be controlled by already-existing pulses
Start and Clock. Obviously, the proposed approach does not require additional CLBs. This
means that it does not require the additional chip area (compared to an FSM architecture
which uses either a registration of inputs or a registration of outputs).
A method for the synthesis of MP
T
Y Mealy FSMs is proposed in this paper. We start
the design from an STG [
1
]. To create tables representing the blocks of the FSM circuit,
the STG is transformed into the equivalent STT [
1
]. The proposed method includes the
following steps:
1. Creating STT of Mealy FSM.
2. Executing replacement of FSM inputs.
3. Assignment of maximum binary state codes K(sm)optimizing SBF (6).
4. Creating SBF (6) representing the block LB.
5. Finding the partition πSwith the minimum cardinality number.
6. Assignment of partial codes C(sm)to states smSk.
7. Encoding of COs YqYusing maximum binary codes.
8. Creating SBF (11) representing the block LY.
9. Constructing tables of LB1–LBK and creating SBFs (15) and (16).
10.
Constructing the table of LTZ and creating systems (17) and (18).
11.
Constructing table of LV and deriving the system (14).
12.
Implementing LUT-based circuit of MPTY FSM.
If an FSM
A
is synthesized using the model of MP
T
Y Mealy FSM, then we denote such
a situation by the symbol MP
T
Y(A). Next, we discuss an example of MP
T
Y
FSM synthesis
.
5. Example of Synthesis of MPTY Mealy FSM Logic Circuit
We discuss the synthesis of Mealy FSM MP
T
Y(A1) using LUTs with
SL=
5 inputs.
The STG (Figure 4) represents the FSM A1.
Using STG (Figure 4), we can derive the sets
S={s1
,
. . .
,
s6}
(each vertex of STG
corresponds to a state);
X={x1
,
. . .
,
x8}
(these inputs are shown above the STG arcs); and
Y={y1
,
. . .
,
y9}
(these outputs are written above the STG arcs). This gives the following
values:
M=
6,
L=
8, and
N=
9. There are
H=
17 arcs connecting the vertices of STG
(Figure 4). Obviously, there are
H=
17 rows in the equivalent STT. As follows from (1),
RMB =
3 is necessary to execute the maximum binary state assignment. This gives the sets
T={T1,T2,T3}and D={D1,D2,D3}.
Step 1. The procedure of transformation is executed using the approach shown in
[
1
]. Each arc of STG determines a row of STT. Each row includes a current state
sC
, a
transition state
sT
, an input signal
Xh
which determines the transition from
sC
into
sT
, an
output collection
Yh
, and the row number,
h
. In the discussed example, the STG (Figure 4)
is transformed into STT (Table 1). This table includes an additional column
q
containing
the subscripts of COs written in each row of the column Yh.
Electronics 2023,12, 1133 10 of 31
S1
S4
S6
S5
S3
x1x2/y1y7
x7/y4y9
x1x7x8/y3y6
x7/y1y2
x1x2x3/y2
x1x2x3/y1y2
S2
x4/y4y5
x4/y3y6
x1/y6y8y9
x1x7x8/-
1/y5y8
x3/y1y7
x1/-
x1x7/y2
x4x5x6/y4y9
x4x5x6/y4y5
x4x5/y1y3y7
Figure 4. State transition graph of Mealy FSM A1.
Table 1. State transition table of FSM A1.
ScSTXhYhqh
s1s1x1- 1 1
s2x1x2y1y72 2
s5x1x2x3y23 3
s3x1x2x3y1y24 4
s2s2x4y3y65 5
s4x4x5y1y3y76 6
s6x4x5x6y4y97 7
s5x4x5x6y4y58 8
s3s61y5y89 9
s4s4x1y5y810 10
s6x1x7y23 11
s1x1x7x8- 1 12
s2x1x7x8y3y65 13
s5s6x7y4y97 14
s3x7y1y24 15
s6s4x3y1y72 16
s1x3y4y58 17
Step 2. The interstate transitions from
smS
depend on inputs creating the set
X(sm)X
with
NIm
elements. To find the number, G, of additional variables
bgB
, it is
necessary to use the following formula [20]:
G=max(NI1, . . . , N IM). (20)
As follows from Table 1, the existing sets
X(sm)X
have the following cardinality
numbers:
NI1=NI2=N I4=
3,
NI5=NI6=
2, and
NI3=
0. Using (20) gives
G=
3 and
B={b1,b2,b3}.
Thus, there is
SL=
5 and
RMB =
3. Using (19) gives
|X(bg)|=SLRMB =
2. Thus,
the IR should be executed in a way so that the relation
|X(bg)|=
2 holds for the maximum
possible number of sets
X(bg)
. Using the proposed approach gives the distribution of
inputs shown in Table 2.
Electronics 2023,12, 1133 11 of 31
Table 2. Table of RI for FSM A1.
B\S S1S2S3S4S5S6
b1x1x4-x1- -
b2x2x5-x7x7-
b3x3x6-x8-x3
Step 3. States
smS
should be encoded in a way that minimizes the numbers of
literals in SBF (6). We denote by symbol
S(bg)
a set of states in which FSM inputs
xlX
are replaced by the additional variable
bgB
. To optimize SBF (6), we propose placing
the codes of states
smS(bg)
in the same rows of an
RMB
- dimensional Karnaugh map. If
an input
xlX
is replaced by a variable
bgB
for states
sm
,
siS(bg)
, then we propose
placing these states into adjusted cells of the map. To optimize the SOP of
bgB
, we can
use three types of insignificant assignments. They are the following: (1) the states with
unconditional transitions; (2) the states which do not belong to a particular set
S(bg)
; and
(3) the combinations of state variables which are not used as state codes. For the discussed
example, the Karnaugh map (Figure 5) includes the state codes.
T1T2
T300 01 11 10
0
1
s1s4s2
s3s5
s6
Figure 5. Outcome of state maximum binary state assignment.
Let us explain how this map was created. There are the sets
S(b1) = {s1
,
s2
,
s4}
and
X(b1) = {x1
,
x4}
. As follows from Figure 5, these states are placed in the same row of the
map. For states
s1
and
s4
, the same input
x1
is replaced. So, these states have adjacent
codes 000 and 010. The code 001 (state
s3
) can be thought of as insignificant because the
transition from this state is unconditional. The code 011 (state
s5
) can be thought of as
insignificant because there is no input symbol in the row
b1
( the transaction from this state
is unconditional). To optimize the term depended on
s2
, we can use state assignments 110
(no state), 111 (the symbol “–” in the row
b1
) and 101 (no state). As a result, the following
Boolean equation is obtained: b1=x1T1+x4T1.
Step 4. Using the approach discussed above, we can obtain the following SBF:
b1=x1(A1A4)x4A2=x1T1x4T1;
b2=x2A1x5A2x7(A4A5) =
=x2T1T2x7T2; (21)
b3=x3(A3A6)x2A2x8A4=
=x3T1T2T3x3T1T2x2T1T2x8x3T2T3.
The analysis of SBF (21) shows that the circuits implemented into its equations have four
LUTs. The circuit for
b1
includes a single LUT, as does the circuit for
b2
. The two-level
circuit generating
b3
includes two LUTs. Thus, in the discussed case, there are four LUTs
and two have their levels in the circuit of LB.
Step 5. We use the approach proposed in the paper [
18
] to create the partition
πS
.
Using the method [
18
] gives the following sets:
πS={S1
,
S2}
,
S1={s1
,
s2
,
s4}
and
S2={s3,s5,s6}. Thus, K=2.
Step 6. As follows from analysis of classes
SkπS
, each class includes
Mk=
3
states. Using (12) and (13) gives the following:
R1=R2=
2,
RS=
4,
V1={v1
,
v2}
,
V2={v3
,
v4}
and
V={v1
,
. . .
,
v4}
. It is known that the partial state codes do not affect
Electronics 2023,12, 1133 12 of 31
the number of LUTs in the circuits of LBk [
18
]. Thus, we can assign them in the trivial
way: codes are assigned as the subscript grows and corresponds to the decimal number
of the step to which the code
C(sm)
is assigned. This approach gives the following codes:
C(s1) = C(s3) = 01, C(s2) = C(s5) = 10, and C(s4) = C(s6) = 11.
Step 7. As follows from Table 1, during the operation of the FSM
A
1, the following
COs are generated:
Y1={}
,
Y2={y1
,
y7}
,
Y3={y2}
,
Y4={y1
,
y2}
,
Y5={y3
,
y6}
,
Y6={y1
,
y3
,
y7}
,
Y7={y4
,
y9}
, Y
8={y4
,
y5}
,
Y9={y5
,
y8}
,
Y10 ={y6
,
y8
,
y9}
. Thus,
there are
Q=
10 collections of outputs generated during the interstate transitions of FSM
A1. Using (9) gives RCO =4 and the set Z={z1, . . . , z4}.
The encoding is executed in such a way as to reduce the total number of literals in
SOPs (11). This can be carried out using, for example, the approach from the work [
44
].
One of the possible outcomes is shown in (Figure 6).
z1z2
z3z400 01 11 10
00
01
11
10
Y1Y2Y7
Y3Y4Y8
Y5Y6Y10
Y9
Figure 6. Codes of output collections.
Step 8. Using codes
K(Yq)
and insignificant input assignments [
1
], we can obtain the
following SBF:
y1=Y2Y4Y6=z1z2;
y2=Y3Y4=z1z4;
y3=Y5Y6=z1z3;
y4=Y7Y8=z1z3;
y5=Y8Y9=z1z4;
y6=Y5Y10 =z2z3; (22)
y7=Y2Y6=z2z4;
y8=Y9Y10 =z1z3;
y9=Y7Y10 =z1z4.
The SBF (22) represents the circuit of block LY. Thus, it corresponds to SBF (11). The
maximum number of literals in the SOPs of (11) is determined as
N×RCO
. In the discussed
case, this number is equal to 9
×
4 = 36. The SBF (22) contains 18 literals. Thus, using the
approach [
44
] allows a reduction in the number of literals by a factor of 2.0 compared to
its maximum possible value. Each literal corresponds to the interconnection between the
blocks LTZ and LY. Thus, reducing the number of literals results in reducing the number of
interconnections. This is a positive factor because interconnections significantly influence
the chip area used, power consumption and performance.
Step 9. To create a table of LBk, it is necessary to use the STT rows representing
transitions from states
smSk
. For example, to create a table representing LB1, we should
choose the rows 1–8 and 10–13 of Table 1. The column
Xh
should be replaced by the
column
B1
h
. This column includes the conjunctions of variables
bgB
corresponding the
conjunctions of replaced inputs
xlX
. The column
Yh
is replaced by the column
Z1
h
. This
column includes the variables
zrZ
equal to 1 in the codes
K(Yq)
of COs shown the
corresponding rows of STT.
Electronics 2023,12, 1133 13 of 31
In addition, this table includes the columns
C(sC)
(the partial code of the current state),
K(sT)
(the MBC of the next state), and
D1
h
(IMFs equal to 1 to load the code
K(sT)
into RG).
In the discussed case, this table contains H1 = 12 rows (Table 3).
For example, the second row of Table 3is created in the following manner. This row
is constructed using the second row of Table 1. This row describes the transition
hs1
,
s2i
executed when the following relation takes place:
x1x2=
1. During this transition, the
CO
Y2={y4
,
y4}
is produced. From the outcome of step 6, we have the code
C(s1) =
01.
This code should be placed in the column
C(sC)
. Using the Karnaugh map (Figure 5) gives
state code
K(sT) =
100. This code should be placed in the column
K(sT)
. It determines
existence of the symbol
D1
in the column
D1
h(h=
2
)
of Table 3. As follows from the column
s1
of Table 2, the input
x1
is represented by
b1
and the input
x2
is replaced by the variable
b2
. Thus, the conjunction
x1x2
is replaced by the conjunction
b1b2
written in the column
B1
h(h=1)of Table 3.
Table 3. Table of block LB1.
ScC(Sc)STK(ST)B1
hZ1
hD1
hh
s101 s1000 b1- - 1
s2100 b1b2z2D12
s5011 b1b2b3z4D2D33
s3001 b1b2b3z2z4D34
s210 s2100 b1z3D15
s4010 b1b2z2z3D26
s6111 b1b2b3z1D1D2D37
s5011 b1b2b3z1z4D2D38
s411 s4010 b1z1z3D29
s6111 b1b2z4D1D2D310
s1000 b1b2b3- - 11
s2100 b1b2b3z1D112
A similar approach is used to create all the rows of Tables 3(block LB1) and 4
(block LB2). These tables represent SBFs (15) and (16). There are examples of some SOPs
shown below:
z1
1=v1v2b1b2v1v2b1v1v2b2b3;
D1
3=v1v2b1b2v1v2b1b2v1v2b1b2. (23)
z2
1=v3v4v3v4b2v3v4b3;
D2
3=v3v4v3v4. (24)
Step 10. The table of block LTZ includes the following columns: “Function” (the
column includes symbols
DrD
and
zrZ
), LB1, LB2. If a PBF is generated by the
block LBk (
k {
0, 1,
. . .
,
K}
), then the intersection of the row with this function and the
column LBk is marked by 1. Otherwise, this intersection contains zero. The block LTZ is
represented by Table 5.
Electronics 2023,12, 1133 14 of 31
Table 4. Table of block LB2.
ScC(Sc)STK(ST)B2
hZ2
hD2
hh
s301 s6111 1 z1z2z3z4D1D2D31
s510 s6111 b1z3D1D2D32
s3001 b2z2z4D33
s611 s4010 b3z2z4D24
s6000 b3z1z7- 5
To fill the columns LB1 and LB2, we use Tables 3and 4, respectively. In the discussed
case, Table 5determines SBFs (17) and (18). For example, the following disjunctions may
be derived from Table 5:
z1=z1
1z2
1;
D3=D1
3D2
3. (25)
Step 11. The block LV converts MBC codes
K(sm)
into the partial state codes
C(sm)
.
The conversion is executed for all states. The table of LV includes the columns
sm
,
K(sm)
,
C(sm)
,
Vm
. If there is
vr=
1 for a particular code
C(sm)
, then there is the symbol
vr
in the
column Vm(Table 6).
Table 5. Table of LTZ.
Function LB1 LB2
D11 1
D21 1
D31 1
z11 1
z21 1
z31 1
z41 1
Table 6. Table of block LV.
SmK(Sm)C(Sm)Vm
s1000 0100 v2
s2100 1000 v1
s3001 0001 v4
s4010 1100 v1v2
s5011 0010 v3
s6111 0011 v3v4
Using Table 6, it is possible to create SBF (14) represented by its perfect SOPs. To
minimize these SOPs, we can create a multi-functional Karnaugh map, as shown in Figure
7.
Electronics 2023,12, 1133 15 of 31
T1T2
T300 01 11 10
0
1
v2v1v2v1
v4v3
v3v4
Figure 7. Multi-functional map of LV.
This Karnaugh map is created using the codes from Figure 5. In Figure 7, the symbols
of states
smS
are replaced by symbols of additional variables
vrV
. This is performed
in the following way: if a particular cell of Figure 5includes a state
smSk
, then the
symbols
vrVk
are rewritten into the corresponding cell of Figure 7. Using Figure 7gives
the following SBF, which determines the contents of LUTs from the block LV:
v1=A2A4=T1T3T2T3;
v2=A1A4=T1T3;
v3=A5A6=T2T3; (26)
v4=A3A6=T2T3T1T3.
Step 12. Using the obtained SOPs, we can estimate how many LUTs it is necessary to
implement in the circuit of MP
T
Y(A1). As follows from SBF (21), condition (19) holds for
SOP functions
b1
,
b2B
. Thus, each of these functions is implemented using a single LUT
with
SL=
5. There are six literals in the SOP
b3B
. Thus, this SOP should be decomposed.
As a result, the corresponding circuit includes two LUTs connected in series. Due to this,
the circuit of LB includes four LUTs and has two levels of logic (Figure 8).
LUT1
x1x4
b1
T1
LUT2
x1x5T1T2
LUT3
x2x8T1T2T3
x7
LUT4
f1x3T1T2T3
b2
b3
T
Figure 8. Circuit of block LB for Mealy FSM MPTY(A1).
Each of the blocks LB1, LB2 (the second level of logic) and LTZ (the third level of
logic) have circuits with seven LUTs. Each of these circuits is single-level. The fourth level
consists of circuits for blocks LY (nine LUTs) and LV (four LUTs).
Thus, the resulting circuit has five levels and includes 38 LUTs. Our analysis of Mealy
FSM MPY(A1) shows the following. There are the same LUT counts for the circuits of
the blocks LB and LY of equivalent MPY and MP
T
Y FSMs. Thus, in the discussed case,
these blocks include 4 + 9 = 13 LUTs. There are
RMB +G=
6 literals in the SOPs of
SBFs (7) and (10)
. Using LUTs with five inputs leads to the functional decomposition of
these SOPs. As the result, there are three LUTs in a two-level circuit implementing any
function from SBFs (7) and (10). There are
RMB +RCO =
7 functions generated by the LTZ
of Mealy FSM MPY(A1). Thus, there are
21 LUTs
in this circuit. This calculation gives 34
LUTs in the circuit of Mealy FSM MPY(A1). The circuit has five levels of LUTs.
Thus, there is the same number of levels in the circuits of FSMs MPY(A1) and
MP
T
Y(A1). However, the circuit of Mealy FSM MPY(A1) includes fewer LUTs. It is
possible to obtain the same LUT count for both circuits if we change the approach for the
encoding of states and COs [
16
]. However, we do not discuss this approach in our current
paper.
Electronics 2023,12, 1133 16 of 31
Our example is rather simple. It is necessary to compare equivalent FSMs based on
various approaches using some benchmarks with a wide range of characteristics. Such a
comparison is given in the next Section. This comparison is executed for FPGAs produced
by AMD Xilinx. Due to this, the industrial package Vivado [
39
] is applied to fulfil all the
necessary steps of technology mapping [7,26,45].
6. Experimental Results
To compare the LUT-based circuits produced by our proposed method with circuits
obtained using some known design methods, we use 48 benchmarks creating the library
LGSsynth93 [
31
]. These benchmarks have a wide diapason of their main characteristics such
as: the numbers of transitions, internal states, input variables, output functions, collections
of FSM outputs. The benchmarks are represented by STTs in the format KISS2. The choice
of this library is based on the fact that a lot of FSM designers use it to compare their results
with main characteristics of known FSM circuits [
27
,
36
,
37
,
46
48
]. The characteristics of the
benchmark FSMs could be found, for example, in our previous articles. Due to this, we do
not show them in our current paper.
To conduct the experiments, we use the Virtex-7 VC709 platform
(xc7vx690tffg1761-2) [49]
based on FPGA chip xc7vx690tffg1761-2 (AMD Xilinx). The CLBs of this chip include LUTs
with six address inputs. To obtain the FSM circuits, we use an industrial package Vivado
v2019.1 (64-bit) [
39
] produced by AMD Xilinx. To process the benchmarks, we use their
VHDL-based models. To transform the KISS2-based benchmarks files into VHDL codes, the
CAD tool K2F [50] is applied.
For each benchmark, we use Vivado reports to find the LUT counts and performance
(the values of cycle time and maximum operating frequency). We compare the proposed
FSM model with four different FSM models. Three of these models are P FSMs based on:
(1) Auto of Vivado (P Mealy FSMs with MBCs); (2) One-hot of Vivado (one-hot-based P
Mealy FSMs); (3) JEDI (P Mealy FSMs with MBCs). As the fourth model, we investigate the
MPY Mealy FSMs.
In our research, we take into account the fact that FSMs are not stand-alone units.
To achieve the stability of the outputs, we use an additional synchronous register. In the
cases of P FSMS, the inputs are loaded into this register. Thus, it consists of L flip-flops.
Obviously, to implement this register, it is necessary to use L additional LUTs. In the
cases of both MPY and
MPTY
FSMs, this register keeps the codes of COs. Thus, it has
RCO
flip-flops and does not require additional LUTs. In addition, it does not require the
additional synchronization pulse. This simplifies the synchronization circuit compared
with equivalent P FSMs.
The results of experiments [
16
,
17
] show that practically all the characteristics of LUT-
based FSM circuits strongly depend on the relation between the values of
L+RMB
, on the
one hand, and
SL
, on the other hand. In experiments, we use Virtex-7 FPGAs for which
SL=
6. We divided the set of benchmarks by classes of complexity (CC). If the symbol
CCP (
P=
1, 2,
. . .
) means a class number, then the benchmarks belonging to a certain class
is determined by the expression
CCP =d(L+RMB)/SLe 1. (27)
For the library used, there are five classes of complexity (CC0-CC4). In each of the
following tables, the benchmarks belonging to a certain class are shown in the column
“Class of complexity”. The class CC0 includes trivial FSMs. The class CC1 includes simple
FSMs. The class CC2 includes average FSMs. The class CC3 includes big FSMs. Finally, the
class CC4 includes very big FSMs.
Tables 716 contain the results of the experiments conducted. Table 7includes the
numbers of LUTs necessary to implement the electrical circuit for a given benchmark. All
benchmarks are represented in this table. Table 8contains the LUT counts for classes
CC0–CC1. Table 9contains the LUT counts for classes CC2–CC4. The negative influence
of the number of FSM inputs is shown in Table 10. Table 11 contains the values of the
Electronics 2023,12, 1133 17 of 31
minimum cycle times for each benchmark. The data for these tables are taken from the
Vivado reports. In addition, we show cycle times separately for classes CC0–CC1 (Table 12)
and CC2–CC4 (Table 13). The values of the maximum operating frequencies are shown in
Table 14. These values are obtained in a simple way using data from Table 11. In addition,
we show the frequencies separately for classes CC0–CC1 (Table 15) and CC2–CC4 (Table
16).
Each table is organized in the same manner. The first column includes the benchmarks’
names, the row “Total” and the row “Percentage”. The names of the investigated methods
are shown in the next five columns. The classes of complexity are shown in the last column.
In the row “Total” are shown the results of the summation of values for a particular column.
Finally, the row “Percentage” includes the percentage of the summarized characteristics of
various FSM circuits in relation to the summarized characteristics of MP
T
Y FSMs. We start
the discussion of the results starting with Table 7.
As follows from Table 8, as compared to other investigated methods, the circuits of
MP
T
Y-based FSMs consist of the minimum number of LUTs. There is the following gain:
(1) 56.99% compared to Auto-based FSMs; (2) 79.13% compared to One-hot –based FSMs;
(3) 33.13% compared to JEDI-based FSMs; and (4) 8.98% compared to MPY-based FSMs.
In second place in terms of gain are MPY-based FSMs. We think this gain is associated
with two factors. First, for rather complex FSMs, SD-based circuits always have fewer
LUTs than for equivalent FD-based FSMs [
9
]. Second, there are an additional L LUTs in the
circuits of FD-based FSMs required to stabilize their operation. In the case of both MPY-
and MP
T
Y-based FSMs, the stabilization is achieved by registering the codes of COs. To
produce these codes, LUTs of LTZ are used. The outputs of these LUTs are connected with
RCO
flip-flops creating the additional register. Thus, there is no need for additional LUTs.
Of course, the gain is also associated with replacing FSM inputs with additional variables.
We think that this diminishes the number of partial functions compared to equivalent
FD-based FSMs.
It is interesting to show how the gain is changed with the change in FSM complexity.
Using Table 7, we created two additional tables. Table 9shows LUT counts for trivial
and simple FSMs. Table 9contains information about LUT counts for average, big and
very big FSMs.
Analysis of Table 8shows that the proposed approach provides the same LUT counts
as for equivalent MPY FSMs. All P-based models require more LUTs. Our approach gives
the following gain: (1) 24.89% compared to Auto-based FSMs; (2) 56.11% compared to
One-hot—based FSMs; and (3) 9.61% compared to JEDI-based FSMs. We think that this
gain is connected to the different stabilization methods used in SD- and FD-based FSMs.
The input register of FD-based FSMs requires more LUTs than the output register of SD-
based FSMs. However, both MPY- and MP
T
Y-based FSMs require more LUTs for trivial
FSMs (the complexity class CC0). We think this has a very simple explanation. Namely,
for trivial FSMs, the condition (5) holds. Thus, there is no need to apply the SD-based
methods. However, these methods are always used during the synthesis of both MPY- and
MP
T
Y-based FSMs. In this case, it is necessary to implement circuits of blocks LB and LY. It
is the presence of these absolutely redundant blocks that determines the marked loss of
SD-based methods.
The next phenomenon comes from Table 8: for the class CC0, the circuits of equivalent
MPY- and MP
T
Y-based FSMs have equal amounts of LUTs. We think this is connected with
the fact that the partition
πS
consists of one class. Due to this, there is no need to use the
blocks LB1–LBK. This means that MP
T
Y FSMs turn into MPY FSMs. Obviously, these FSM
circuits should have equal values for all the other characteristics. This, once again, indicates
that it is advisable to use different FSM models for different conditions. Thus, it makes no
sense to apply SD-based methods when condition (5) is met.
Electronics 2023,12, 1133 18 of 31
Table 7. Experimental results (the LUT counts).
Benchmark Auto One-Hot JEDI MPY MPTYClass of Complexity
bbara 21 21 14 12 12 CC1
bbsse 40 44 31 14 14 CC1
bbtas 7 7 7 9 9 CC0
beecount 22 22 17 13 13 CC1
cse 47 73 43 18 18 CC1
dk14 19 30 13 12 12 CC1
dk15 18 19 15 11 11 CC1
dk16 17 36 14 14 14 CC1
dk17 7 14 7 9 9 CC0
dk27 4 6 5 8 8 CC0
dk512 11 11 10 14 14 CC0
donfile 33 33 26 21 21 CC1
ex1 79 83 62 28 24 CC2
ex2 11 11 10 11 11 CC1
ex3 11 11 11 16 16 CC0
ex4 21 19 18 12 12 CC1
ex5 11 11 11 15 15 CC0
ex6 29 41 27 21 21 CC1
ex7 6 7 6 10 10 CC1
keyb 50 68 47 28 28 CC1
kirkman 54 70 51 28 22 CC2
lion 4 7 4 10 10 CC0
lion9 8 13 7 12 12 CC0
mark1 28 28 25 22 22 CC1
mc 7 10 7 12 12 CC0
modulo12
8 8 8 11 11 CC0
opus 33 33 27 20 20 CC1
planet 138 138 95 76 68 CC2
planet1 138 138 95 76 68 CC2
pma 102 102 94 74 62 CC2
s1 73 107 69 52 48 CC2
s1488 132 139 116 86 79 CC2
s1494 134 140 118 92 83 CC2
s1a 57 89 51 42 35 CC2
s208 23 42 21 20 18 CC2
s27 10 22 10 12 12 CC1
s386 33 46 29 31 31 CC1
s420 29 50 28 24 20 CC4
Electronics 2023,12, 1133 19 of 31
Table 7. Cont.
Benchmark Auto One-Hot JEDI MPY MPTYClass of Complexity
s510 67 67 51 42 36 CC4
s820 13 13 13 14 14 CC1
s832 106 100 86 70 62 CC4
s840 98 97 80 68 56 CC4
sand 143 143 125 99 83 CC3
shiftreg 3 7 3 8 8 CC0
sse 40 44 37 38 38 CC1
styr 102 129 90 81 79 CC2
tma 52 46 46 41 36 CC2
Total 2099 2395 1780 1457 1337
Percentage,%
156.99 179.13 133.13 108.98 100.00
Now, we are going to discuss the temporal characteristics of FSM circuits. First of all,
we show the negative influence of input register. In all P-based FSMs, the stabilization of
operation is achieved due to loading FSM inputs into the additional register. Thus, this
approach leads to the use of L additional LUTs and flip-flops. Obviously, the cycle time
increases due to the presence of the chain < input-LUTs–flip-flops–LUTs of LB>. In addition,
this increases the consumed power. We explored how the number of inputs affects the time
and power characteristics of resulting circuits. This information is shown in Table 10.
As follows from Table 10, the number of inputs significantly affects the timing and
energy characteristics of LUT-based FSM circuits. The more inputs the FSM has, the greater
their negative impact. In the case of the investigated SD-based FSMs, the stabilization
is achieved due to the registering codes of COs. In this case, the number of additional
flip-flops is equal to
RCO
. Moreover, there is no need for additional LUTs because the codes
of COs are generated by the LUTs of LTZ. As follows, for the studied benchmarks, the
following relation holds:
RCO L
. The validity of this relation determines the gain in
time characteristics obtained due to the transition from FD-based FSMs to SD-based FSMs.
This gain is shown in Table 11.
As follows from Table 11, the SD-based FSMs have the best values of cycle time.
Our proposed method produces FSM circuits which are a bit slower than the circuits of
MPY-based FSMs (the average loss is 0.76%). However, our method has the following
average gain compared to other FSMs: (1) 70.65% compared to Auto-based FSMs; (2) 71.08%
compared to One-hot-based FSMs; and (3) 62.13% compared to JEDI-based FSMs. This gain
for the SD-based FSMs is explained by the difference in the methods used for stabilizing
the FSM outputs, as discussed before.
To show the influence of FSM complexity, we create two additional tables. Table 12
includes information about the cycle times for trivial and simple FSMs. Table 13 includes
information about the cycle times for average, big and very big FSMs.
Electronics 2023,12, 1133 20 of 31
Table 8. Experimental results (the LUT counts for classes CC0-CC1).
Benchmark Auto One-Hot JEDI MPY MPTYClass of Complexity
bbara 21 21 14 12 12 CC1
bbsse 40 44 31 14 14 CC1
bbtas 7 7 7 9 9 CC0
beecount 22 22 17 13 13 CC1
cse 47 73 43 18 18 CC1
dk14 19 30 13 12 12 CC1
dk15 18 19 15 11 11 CC1
dk16 17 36 14 14 14 CC1
dk17 7 14 7 9 9 CC0
dk27 4 6 5 8 8 CC0
dk512 11 11 10 14 14 CC0
donfile 33 33 26 21 21 CC1
ex2 11 11 10 11 11 CC1
ex3 11 11 11 16 16 CC0
ex4 21 19 18 12 12 CC1
ex5 11 11 11 15 15 CC0
ex6 29 41 27 21 21 CC1
ex7 6 7 6 10 10 CC1
keyb 50 68 47 28 28 CC1
lion 4 7 4 10 10 CC0
lion9 8 13 7 12 12 CC0
mark1 28 28 25 22 22 CC1
mc 7 10 7 12 12 CC0
modulo12
8 8 8 11 11 CC0
opus 33 33 27 20 20 CC1
s27 10 22 10 12 12 CC1
s386 33 46 29 31 31 CC1
s820 13 13 13 14 14 CC1
shiftreg 3 7 3 8 8 CC0
sse 40 44 37 38 38 CC1
Total 572 715 502 458 458
Percentage,%
124.89 156.11 109.61 100.00 100.00
Electronics 2023,12, 1133 21 of 31
Table 9. Experimental results (the LUT counts for classes CC2-CC4).
Benchmark Auto One-Hot JEDI MPY MPTYClass of Complexity
ex1 79 83 62 28 24 CC2
kirkman 54 70 51 28 22 CC2
planet 138 138 95 76 68 CC2
planet1 138 138 95 76 68 CC2
pma 102 102 94 74 62 CC2
s1 73 107 69 52 48 CC2
s1488 132 139 116 86 79 CC2
s1494 134 140 118 92 83 CC2
s1a 57 89 51 42 35 CC2
s208 23 42 21 20 18 CC2
s420 29 50 28 24 20 CC4
s510 67 67 51 42 36 CC4
s832 106 100 86 70 62 CC4
s840 98 97 80 68 56 CC4
sand 143 143 125 99 83 CC3
styr 102 129 90 81 79 CC2
tma 52 46 46 41 36 CC2
Total 1527 1680 1278 999 879
Percentage,%
173.72 191.13 145.39 113.65 100.00
Table 10. Influence of input register on cycle time and consumed power.
L Power [W] Data Path Delay [ns]
1 0.356 3.471
2 0.367 3.599
3 0.380 3.603
4 0.392 3.640
5 0.406 3.667
6 0.418 3.688
7 0.431 3.729
8 0.448 3.793
9 0.462 3.800
10 0.477 3.705
11 0.491 3.767
12 0.511 3.898
18 0.608 4.112
19 0.623 4.113
As follows from Table 12, the time characteristics are equal for SD-based trivial and
simple FSMs. They have the following gain: (1) 65.63% compared with both Auto- and
One-hot—based FSMs and (2) 59.60% compared with JEDI-based FSMs. The reasons for
this situation are as discussed before.
Electronics 2023,12, 1133 22 of 31
Table 11. Experimental results (the cycle time, nanoseconds).
Benchmark Auto One-Hot JEDI MPY MPTYClass of Complexity
bbara 8.811 8.811 8.352 5.214 5.214 CC1
bbsse 10.096 9.642 9.213 5.226 5.226 CC1
bbtas 8.497 8.497 8.451 5.308 5.308 CC0
beecount 9.605 9.605 8.941 5.373 5.373 CC1
cse 10.558 9.840 9.343 5.453 5.453 CC1
dk14 8.821 9.395 8.762 5.839 5.839 CC1
dk15 8.797 8.998 8.735 5.219 5.219 CC1
dk16 9.491 9.320 8.672 5.245 5.245 CC1
dk17 8.617 9.587 8.617 5.400 5.400 CC0
dk27 8.325 8.424 8.369 5.195 5.195 CC0
dk512 8.566 8.566 8.477 4.119 4.119 CC0
donfile 9.033 9.034 8.509 5.168 5.168 CC1
ex1 10.425 10.955 9.454 5.821 5.741 CC2
ex2 8.635 8.635 8.596 5.624 5.624 CC1
ex3 8.731 8.731 8.707 5.931 5.931 CC0
ex4 9.214 9.315 8.874 5.481 5.481 CC1
ex5 9.147 9.147 9.119 5.425 5.425 CC0
ex6 9.564 9.772 9.330 5.369 5.369 CC1
ex7 8.598 8.578 8.584 5.200 5.200 CC1
keyb 10.121 10.699 9.666 5.265 5.265 CC1
kirkman 10.971 10.392 10.280 5.612 5.482 CC2
lion 8.539 8.501 8.541 6.062 6.062 CC0
lion9 8.470 8.998 8.444 5.270 5.270 CC0
mark1 9.825 9.825 9.343 6.395 6.395 CC1
mc 8.688 8.719 8.682 6.099 6.099 CC0
modulo12
8.302 8.302 8.299 5.928 5.928 CC0
opus 9.684 9.684 9.275 5.322 5.322 CC1
planet 11.264 11.264 9.073 6.018 5.878 CC2
planet1 11.264 11.264 9.073 6.018 5.834 CC2
pma 10.634 10.634 9.681 6.101 6.101 CC2
s1 10.623 11.154 10.156 5.830 5.707 CC2
s1488 11.013 11.372 10.155 6.432 6.206 CC2
s1494 10.487 10.654 9.878 5.723 5.511 CC2
s1a 10.313 9.462 9.704 5.689 5.511 CC2
s208 9.503 9.434 9.361 6.125 5.835 CC2
s27 8.672 8.862 8.662 6.387 6.387 CC1
s386 9.676 9.494 9.311 6.164 6.164 CC1
s420 9.864 9.780 9.755 5.868 6.028 CC4
s510 9.742 9.742 9.155 5.324 5.834 CC4
s820 10.691 10.641 9.775 5.726 5.726 CC1
s832 10.975 10.638 9.866 6.724 6.401 CC4
Electronics 2023,12, 1133 23 of 31
Table 11. Cont.
Benchmark Auto One-Hot JEDI MPY MPTYClass of Complexity
s840 9.195 9.228 9.158 6.232 5.882 CC4
sand 12.390 12.390 11.652 7.221 7.087 CC3
shiftreg 8.302 7.265 7.091 5.564 5.564 CC0
sse 10.096 9.642 9.455 5.561 5.561 CC1
styr 11.067 11.497 10.666 5.921 5.719 CC2
tma 9.831 10.495 9.821 5.702 5.596 CC2
Total 453.73 454.88 431.08 267.89 265.88
Percentage,%
170.65 171.08 162.13 100.76 100.00
As follows from Table 13, starting from the complexity CC2, our approach wins in
performance. There is the following gain: (1) 78.93% compared with Auto-based FSMs;
(2) 79.72% compared with One-hot-based FSMs; (3) 66.3% compared with JEDI-based
FSMs and (4) 2.0% compared with equivalent MPY FSMs. We think that the superiority of
SD-based FSMs is due to the fact that they generate fewer partial Boolean functions. Due to
this, their circuits have fewer logic levels and interconnections. In turn, they are faster.
The slight superiority of MP
T
Y FSMs (2%) in relation to MPY FSMs is due to the fact
that MP
T
Y FSMs have fewer interconnections. This is connected with different approaches
of stabilization. Since interconnections significantly affect the timing characteristics, our
approach produces faster circuits for FSMs from the classes CC2-CC4. Apparently, equiva-
lent SD-based FSMs have the same number of logic levels (the number of series-connected
LUTs). Thus, with respect to the other methods under study, the performance of MP
T
Y
FSMs improves as their complexity increases.
We did not obtain the values of maximum operating frequencies from Vivado reports.
However, we calculated them using the values of cycle times. The frequency comparison is
represented by Table 14.
As follows from Table 14, on average, the circuits of MP
T
Y-based FSMs are faster in
relation to all other models. There is the following gain: (1) 58.79% compared to Auto-based
FSMs; (2) 58.7% compared to One-hot-based FSMs; (3) 61.65% compared to JEDI-based
FSMs; and (4) 0.64% compared to MPY-based FSMs. Obviously, the reasons for this gain
are the same as the ones discussed for the time of cycles. We will not repeat them.
Naturally, the change in the gain in frequency has the same tendencies as the change
in the gain in cycle time. This statement is justified by information from Tables 15 and 16.
It should be noted that the gain in operating frequency for our method begins to
appear from the complexity CC2. At the same time, the gain grows in the process of the
transition to the highest categories of complexity.
Thus, if FSMs belong to the classes CC0-CC1, then equivalent MP
T
Y and MPY FSMs
have the same values of LUT counts, cycle time and maximum operating frequency. For
more complex FSMs, MP
T
Y FSMs require fewer LUTs than for equivalent MPY FSMs. In
addition, for FSMs from classes CC0-CC1, both models have the same values of temporal
characteristics. However, as the complexity increases, the temporal characteristics of the
MP
T
Y FSMs gradually become slightly better than they are for equivalent MPY FSMs. This
gain is rather small; however, the very fact that a decrease in the number of LUTs does not
lead to performance degradation is important. The results of the experiments allow us to
draw the following conclusion: MP
T
Y FSMs can replace MPY FSMs for average, big and
very big sequential devices. For a more visual assessment of the results, we built a diagram
(Figure 9). This diagram shows a comparison of percentages for the main characteristics of
the studied methods.
Electronics 2023,12, 1133 24 of 31
Table 12. Cycle times for classes CC0-CC1 (nanoseconds).
Benchmark Auto One-Hot JEDI MPY MPTYClass of Complexity
bbara 8.811 8.811 8.352 5.214 5.214 CC1
bbsse 10.096 9.642 9.213 5.226 5.226 CC1
bbtas 8.497 8.497 8.451 5.308 5.308 CC0
beecount 9.605 9.605 8.941 5.373 5.373 CC1
cse 10.558 9.840 9.343 5.453 5.453 CC1
dk14 8.821 9.395 8.762 5.839 5.839 CC1
dk15 8.797 8.998 8.735 5.219 5.219 CC1
dk16 9.491 9.320 8.672 5.245 5.245 CC1
dk17 8.617 9.587 8.617 5.400 5.400 CC0
dk27 8.325 8.424 8.369 5.195 5.195 CC0
dk512 8.566 8.566 8.477 4.119 4.119 CC0
donfile 9.033 9.034 8.509 5.168 5.168 CC1
ex2 8.635 8.635 8.596 5.624 5.624 CC1
ex3 8.731 8.731 8.707 5.931 5.931 CC0
ex4 9.214 9.315 8.874 5.481 5.481 CC1
ex5 9.147 9.147 9.119 5.425 5.425 CC0
ex6 9.564 9.772 9.330 5.369 5.369 CC1
ex7 8.598 8.578 8.584 5.200 5.200 CC1
keyb 10.121 10.699 9.666 5.265 5.265 CC1
lion 8.539 8.501 8.541 6.062 6.062 CC0
lion9 8.470 8.998 8.444 5.270 5.270 CC0
mark1 9.825 9.825 9.343 6.395 6.395 CC1
mc 8.688 8.719 8.682 6.099 6.099 CC0
modulo12
8.302 8.302 8.299 5.928 5.928 CC0
opus 9.684 9.684 9.275 5.322 5.322 CC1
s27 8.672 8.862 8.662 6.387 6.387 CC1
s386 9.676 9.494 9.311 6.164 6.164 CC1
s820 10.691 10.641 9.775 5.726 5.726 CC1
shiftreg 8.302 7.265 7.091 5.564 5.564 CC0
sse 10.096 9.642 9.455 5.561 5.561 CC1
Total 274.17 274.53 264.20 165.53 165.53
Percentage,%
165.63 165.85 159.60 100.00 100.00
To construct charts (Figure 9), we used tables in which the results are shown for
all benchmarks
, and not for their individual categories. To show the results for LUT counts,
we used Table 7. The times of cycles are taken from Table 11. At last, the results for the
values of maximum operating frequencies are derived from Table 14. It clearly follows from
Figure 9that the proposed method allows the improvement in the spatial characteristics of
circuits (without the degradation of temporal characteristics).
Electronics 2023,12, 1133 25 of 31
Table 13. Cycle times for classes CC2-CC4 (nanoseconds).
Benchmark Auto One-Hot JEDI MPY MPTYClass of Complexity
ex1 10.425 10.955 9.454 5.821 5.741 CC2
kirkman 10.971 10.392 10.280 5.612 5.482 CC2
planet 11.264 11.264 9.073 6.018 5.878 CC2
planet1 11.264 11.264 9.073 6.018 5.834 CC2
pma 10.634 10.634 9.681 6.101 6.101 CC2
s1 10.623 11.154 10.156 5.830 5.707 CC2
s1488 11.013 11.372 10.155 6.432 6.206 CC2
s1494 10.487 10.654 9.878 5.723 5.511 CC2
s1a 10.313 9.462 9.704 5.689 5.511 CC2
s208 9.503 9.434 9.361 6.125 5.835 CC2
s420 9.864 9.780 9.755 5.868 6.028 CC4
s510 9.742 9.742 9.155 5.324 5.834 CC4
s832 10.975 10.638 9.866 6.724 6.401 CC4
s840 9.195 9.228 9.158 6.232 5.882 CC4
sand 12.390 12.390 11.652 7.221 7.087 CC3
styr 11.067 11.497 10.666 5.921 5.719 CC2
tma 9.831 10.495 9.821 5.702 5.596 CC2
Total 179.56 180.36 166.89 102.36 100.35
Percentage,%
178.93 179.72 166.30 102.00 100.00
Table 14. Experimental results (the maximum operating frequency, MHz).
Benchmark Auto One-Hot JEDI MPY MPTYClass of Complexity
bbara 113.496 113.496 119.727 191.809 191.809 CC1
bbsse 99.049 103.713 108.539 191.342 191.342 CC1
bbtas 117.687 117.687 118.336 188.389 188.389 CC0
beecount 104.112 104.112 111.839 186.111 186.111 CC1
cse 94.713 101.626 107.03 183.399 183.399 CC1
dk14 113.364 106.439 114.134 171.26 171.26 CC1
dk15 113.675 111.137 114.487 191.626 191.626 CC1
dk16 105.362 107.294 115.316 190.654 190.654 CC1
dk17 116.049 104.308 116.049 185.192 185.192 CC0
dk27 120.122 118.709 119.494 192.487 192.487 CC0
dk512 116.74 116.74 117.963 242.792 242.792 CC0
donfile 110.706 110.696 117.517 193.504 193.504 CC1
ex1 95.922 91.281 105.777 171.796 174.19 CC2
ex2 115.808 115.808 116.34 177.799 177.799 CC1
ex3 114.536 114.536 114.846 168.594 168.594 CC0
ex4 108.53 107.352 112.69 182.443 182.443 CC1
ex5 109.327 109.327 109.661 184.328 184.328 CC0
ex6 104.556 102.333 107.183 186.268 186.268 CC1
Electronics 2023,12, 1133 26 of 31
Table 14. Cont.
Benchmark Auto One-Hot JEDI MPY MPTYClass of Complexity
ex7 116.306 116.576 116.495 192.304