FOR THE PA ST SEV ERA L Y EA RS, high-
capacity field-programmable devices have
enjoyed a rapidly expanding market and
have become widely accepted for the im-
plementation of small to moderately large
digital circuits. The two main types of FPDs,
field-programmable gate arrays and complex
programmable logic devices, are both wide-
ly used, each offering particular strengths.
FPGAs programmed with static RAM tech-
nology are usually based on lookup tables.
Their main strengths are very high logic ca-
pacity—in the range of hundreds of thou-
sands of equivalent logic gates—and good
speed-performance—up to 50-MHz system
clock rates. On the other hand, CPLDs consist
of multiple PLA-like blocks, in which the OR
planes are partly fixed. Their characteristics
include medium capacity, in the range of a
few thousand gates, and ultrahigh speed-
performance, sometimes in excess of a 200-
MHz system clock rate.
In this article, we propose a new FPD
architecture, called the Hybrid Field-
Programmable Architecture, which com-
bines FPGAs and CPLDs. The basis of the
HFPA is that some parts of digital circuits are
well-suited for implementation with LUTs, but
other parts benefit more from the product-
term-based structures in CPLDs. Comparison
with an architecture containing only LUTs in-
dicates that the new architecture offers sig-
nificant savings in total chip area. Also, the
HFPA can reduce the depth of circuits im-
plemented in the FPGA, which may provide
improvements in speed-performance.
Underlying benchmark analysis
We can represent any digital circuit as a
directed acyclic graph consisting of combi-
national and sequential nodes, with each
combinational node of the circuit in sum-of-
products form. As the first step in developing
the HFPA, we examined the combinational
nodes present in real circuits and produced
a distribution of the nodes with respect to
size. We defined a node’s size according to
two parameters: the number of inputs to the
node and the number of product terms in the
node’s sum-of-products representation. The
circuits we used are from the 1993 MCNC
(Microelectronics Center of North Carolina)
logic synthesis benchmark suite. We passed
the circuits through one run of SIS
script.rugged,1a technology-independent op-
timization script. This resulted in 40,131 com-
binational nodes in 197 benchmarks. Our
examination of the nodes revealed that more
than 70% are 4-bounded and roughly 20%
have fan-ins equal to or greater than six; we
refer to the latter as high-fan-in nodes.
We wanted to consider implementing the
nodes in two types of logic resources: PLA-
like blocks and LUTs. For a LUT with K in-
The Hybrid Field-
HY BRID FPG A
0740-7475/99/$10.00 © 1999 IEEE
IEEE DESIG N & TEST OF COM PUTERS
The authors propose a
new architecture that
combines two existing
table-based FPGAs and
logic devices based on
PLA-like blocks. Their
mapping results indicate
that on average LUT-
based FPGAs require
78% more area than
their hybrid FPGA, while
providing roughly the
same circuit depth.
University of Toronto
A PRIL–J UNE 1 9 9 9 7 5
puts, the cell’s area is proportional to 2K. For a PLA-like block,
the area is approximately proportional to K × P, where P is
the number of pterms in the block. Assuming Pis close to K,
we can simplify the representative area of a PLA-like block
to K2. For K= 4, 2K= K2, and K< 4, LUTs are more efficient than
PLAs. Therefore, 4-bounded nodes can be efficiently im-
plemented with LUTs. This accounts for most nodes in cir-
cuits, but there is still a significant number with high fan-ins.
We could also implement these nodes with 4-LUTs, but the
area required would be large. We observed that most high-
fan-in nodes do not require a large number of pterms and
thus are well suited for PLA implementation. Therefore, we
decided the HFPA would contain both PLA-like blocks,
which we call programmable array logic blocks, and 4-LUTs.
Figure 1 illustrates the concept of suitability of nodes of
different sizes in either LUTs or PLAs. The example circuit
consists of five combinational nodes, each represented by
its personality matrix. The personality matrix’s columns are
associated with the inputs to the combinational node, and its
rows correspond to the pterms in the sum-of-products form
of the node. For example, a 1 in the second row and fourth
column means that the fourth input to the node appears in
the second pterm of its sum-of-products form, with positive
polarity. Similarly, a 0 at this position means that the corre-
sponding literal appears with negative polarity, and the sym-
bol “–” means that the corresponding literal does not appear
in the pterm. Therefore, the numbers of rows and columns in
a personality matrix equal the numbers of pterms and inputs
in the sum-of-products form of the node, respectively. Note
that we are not considering the inverters as separate nodes
because they can be realized in their fan-in cells with no ex-
If we implement the circuit shown in Figure 1 in a LUT-
based FPGA, we will need four 4-LUTs for nodes B, C, D, and
E in addition to the number of LUTs required to implement
node A. We used the Synopsys FPGA compiler with the high-
est optimization effort to map node A to 4-LUTs; we needed
a total of 15 LUTs. Therefore, we needed at least 19 LUTs to
implement the five combinational nodes. In an architecture
that contains PALBs as well as LUTs, we can implement node
A in one PALB, and the rest of the nodes will require four 4-
LUTs. This is the equivalent area of only eight 4-LUTs, as ex-
plained later. The key point is that when synthesis tools fail
to find good decompositions for a node such as A, the area
necessary to realize the node with LUTs is high.
On the other hand, O2 in Figure 1 is efficiently decom-
posed into four 4-bounded nodes; therefore 4-LUTs can im-
plement O2 in a reasonable area. If we fully collapse nodes
B to E, the result will have 12 inputs and 26 pterms. Such a
large number of pterms is expensive in terms of area when
implemented in a PLA-like block. In a CPLD containing only
PLA-like blocks of the size of our PALBs, the circuit in Figure
1 will take at least 2.5 PALBs, equivalent to 10 4-LUTs in area.
Therefore, an architecture that contains a mixture of PALBs
4-LUT four-input LUT
LUTBLUT block, contains four 4-LUTs that can be locally
PALBprogrammable array logic block (a PLA-like block)
PLAprogrammable logic array
pterm product term
Berkeley Logic Interchange Format
complex programmable logic device
Electronic Design Interchange Format
field-programmable gate array
the Hybrid Field-Programmable Architecture
Figure 1. An example of combinational nodes in a circuit.
A PRIL–J UNE 1 9 9 98 3
reason, we believe that the rest of the synthesis flow will be
quite similar to synthesis for any architecture with a hierar-
chical routing structure. Therefore, synthesis to the HFPA is
commercially viable, and the extra complexity is well justi-
fied by the new architecture’s advantages.
Effects of technology-independent optimization. The
first step in synthesizing circuits to a specific architecture is
often technology-independent optimization. It consists of a
series of partial collapsing and factoring (decomposition)
operations. Decomposing the circuits efficiently is especially
important when the target architecture accommodates only
low fan-in nodes—for example, LUT-based FPGAs. The SIS
optimization scripts may not be as vigorous as those in state-
of-the-art, commercial logic synthesis tools, which may find
better decompositions. Using these tools will lead to a low-
er area gain but a higher depth gain for the HFPA. The opti-
mization algorithms in commercial tools are often integrated
with specific target architectures. Thus, it is difficult to in-
vestigate the advantages of the HFPA with the technology-
independent optimization methods incorporated in these
tools. According to our comparison of script.rugged and
script.algebraic, the latter finds better decompositions, lead-
ing to a lower area gain for the HFPA.
Another issue is that our choice of PALB was based on our
analysis of benchmarks after optimization. If we had used
different optimization tools, we might have reached slightly
different results in terms of the PALB parameters. However,
we believe that our current choices are reasonable.
IN THE FUTURE, we intend to investigate the HFPA syn-
thesis issues more thoroughly. Our technology mapper
needs some improvements. We also plan to develop a place-
ment-and-routing tool to investigate the amount of routing
resources and appropriate positioning of switches.
Incorporating in the technology mapper an automated par-
tial collapser designed especially for the HFPA might in-
crease gain and make mapping easier.7Finally, it is likely
that the HFPA can be enhanced in several ways, and we will
continue improving the architecture.
1. E.M. Sentovich et al., SIS: A System for Sequential Circuit Syn-
thesis, Memorandum No. UCB/ERL M92/41, Electronics Re-
search Laboratory, Dept. of Electrical Engineering and
Computer Science, Univ. of California, Berkeley, 1992.
2. J. Rose et al., “Architecture of Field-Programmable Gate Ar-
rays: The Effect of Logic Block Functionality on Area Efficien-
cy,” IEEE J. Solid-State Circuits, Vol. 25, No. 5, Oct. 1990, pp.
3. A. Aggarwal and D. Lewis, “Routing Architectures for Hierar-
chical Field-Programmable Gate Arrays,” Proc. Int’l Conf. Com-
puter Design, IEEE Computer Society Press, Los Alamitos, Calif.,
1994, pp. 475-478.
4. B.R. Owen et al., “BALLISTIC: An Analog Layout Language,”
Proc. IEEE Custom Integrated Circuits Conf., IEEE CS Press, 1995,
5. J. Cong and Y. Ding, “FlowMap: An Optimal Technology Map-
ping Algorithm for Delay Optimization in Lookup-Table-Based
FPGA Designs,” IEEE Trans. CAD of Integrated Circuits and Sys-
tems, Vol. 13, No. 1, Jan. 1994, pp. 1-12.
6. H.J. Touati, H. Savoj, and R.K. Brayton, “Delay Optimization of
Combinational Logic Circuits by Clustering and Partial Col-
lapsing,” Proc. IEEE Conf. Computer-Aided Design, IEEE CS
Press, 1991, pp. 188-191.
7. A. Kaviani, Novel Architectures and Synthesis Methods for High
Capacity Field Programmable Devices, PhD dissertation, Dept.
of Electrical and Computer Engineering, Univ. of Toronto, 1999.
8. A.H. Farrahi and M. Sarrafzadeh, “Complexity of the Lookup-
Table Minimization Problem for FPGA Technology Mapping,”
IEEE Trans. CAD of Integrated Circuits and Systems, Vol. 13, No.
11, Nov. 1994, pp. 1319-1332.
Alireza Kaviani is a lecturer at the Universi-
ty of Toronto. His main research interests in-
clude architectures and synthesis methods for
FPDs and microprocessor systems. He has
more than three years of industry experience,
including a year at Hewlett-Packard. Kaviani
holds an MASc degree in computer engi-
neering from the University of Toronto and a BS degree in elec-
tronic engineering from Sharif University, Iran. He received his PhD
in electrical and computer engineering from the University of
Toronto. He is a member of the IEEE and the Computer Society.
Stephen Brown is an associate professor of electrical and com-
puter engineering at the University of Toronto. His dissertation on
architecture and CAD for FPGAs won the Canadian NSERC’s 1992
prize for the best doctoral thesis in Canada. He won a best paper
award at the 1990 ICCAD. He has also won four awards for excel-
lence in teaching electrical engineering, computer engineering,
and computer science. He is a coauthor of the book Field-
Programmable Gate Arrays. He was general and program chair of
the Fourth Canadian Workshop on Field-Programmable Devices.
Brown holds a PhD in electrical engineering from the University
of Toronto. He is a member of the IEEE and the Computer Society.
Send questions and comments about this article to the authors
at Dept. of Electrical and Computer Eng., 10 King’s College Rd.,
University of Toronto, Toronto, Ontario, Canada M5S 3G4;
email@example.com and firstname.lastname@example.org.