ArticlePDF Available

Proposal of a New Structure for Netfpga Cards

  • Pharmaceutical Services Negotiating Committee


In this paper, we present a proposal of a new internal structure for NetFPGA cards and its analysis. We propose to use a switching fabric instead of a single pipeline. Such a change complicates some modules and connects several of them into a single, more complicated module. Their functionality is the same, but the delay of the served Ethernet frame will be decreased.
Image Processing & Communications, vol. 22, no. 1, pp.27-34
DOI: 10.1515/ipc-2017-0003 27
Poznan University of Technology, Faculty of Electronics and Telecommunications,
Polanka 3, 60-965 Poznan, Poland
Abstract. In this paper, we present a pro-
posal of a new internal structure for NetFPGA
cards and its analysis. We propose to use a
switching fabric instead of a single pipeline.
Such a change complicates some modules and
connects several of them into a single, more
complicated module. Their functionality is the
same, but the delay of the served Ethernet frame
will be decreased.
1 Introduction
Nowadays, the IT world is very fast. There are many
technologies offering a high throughput, high capabilities,
high performance, and so on. Most of them are closed in
industrial chips (ASICSs) and users are only users - they
have no influence on the functionalities of such chips. But
there exists a perfect technology for users who want to
prepare their own functionalities in very fast chips. This
technology is known as FPGA - Field Programmable Gate
Arrays. FPGA chips are programmable digital chips that
offer very elastic functionalities which can be realized
very quickly. Professional companies and chip producers
have prepared many development boards ready to be pro-
grammed and used. So, the user can have the perfect tool
for prototyping and researching. In this paper, we focus
on NetFPGA cards. They are designed and prepared by
the Stanford University and the University of Cambridge
and developed by Digium and HightechGlobal. A NetF-
PGA card has a main FPGA chip and interfaces of the
Ethernet network. Ethernet frames received on these in-
terfaces are treated as digital data and processed in the
main chip. According to this philosophy, a NetFPGA card
works as a network node. Such a node is programmable
in a very wide range and it realizes its functionalities very
fast (in programmable hardware). In this paper, we de-
scribe our proposal of a new internal structure for NetF-
PGA cards. The rest of this paper is organized as follows:
Section II describes NetFPGA cards, Section III presents
our motivation for the work described here, Section IV
and V describe the current and new structures of NetF-
PGA cards and their analysis. Finally, we conclude the
presented work and describe the future one.
2 NetFPGA Cards
NetFPGA cards are typical PC extension cards [1]. They
are widely described in the literature and on websites, and
Download Date | 5/20/18 2:20 AM
28 M. Michalski, T. Sielach
Fig. 1: NetFPGA card with electrical interfaces with the
speed of 1Gpbs
here we only mention the most important facts. These
cards are actually available in two versions: 1 Gbps and
10 Gbps; the main difference between them is the maxi-
mal throughput of their Ethernet ports and the main FPGA
devices. A NetFPGA card with interfaces with the speed
of 1 Gbps uses Virtex II, while NetFPGA 10Gbps uses
Virtex V. Their photographs are shown in Fig. 1and 2.
As it was mentioned earlier, the cards were designed by
members of the NetFPGA group composed of scientists
and researchers from the Stanford University and the Uni-
versity of Cambridge [1]. Extension cards are offered at
reasonable prices and the software is available on a BSD
Fig. 2: NetFPGA card with universal SFP+ ports with the
speed of 10Gpbs
PCI/PCIe driver
FPGA chip
Input Arbiter
Output Queues
User Module
Output Port Lookup
Fig. 3: Schematic of NetFPGA cards
The main FPGA chip provides flexibility and high per-
formance of the card. It may be configured in HDL (Hard-
ware Description Language). Due to that fact, almost any
idea can be implemented on that chip. It is only a matter of
time and designer’s skills. Consequently, it is a powerful
tool for scientists, researchers and students alike [2], [3].
The code base provided for a NetFPGA card contains
reference projects and contributed projects [4]. By using
them, a student may become familiar with the framework
and other tools, such as traffic analyzers and perl/python
The development of a new NetFPGA project involves
a few obligatory stages. At the first stage, it is essen-
tial to make a functional design, then the functionality
is encoded into an HDL language (such as Verilog or
VHDL). Using modules from the NetFPGA code base
is a good idea, because these pieces of code are already
tested. There are many tools created to support HDL code
preparation and analysis. Due to the fact that on the NetF-
PGA card (both 1G and 10G), the main programmable de-
vice is made by Xilinx Company, the dedicated tools are
also from this company: ISE (Integrated Software Envi-
ronment) [5] for 1G and Xilinx Platform Studio (XPS) [6]
for 10G, because of a different code architecture. These
Download Date | 5/20/18 2:20 AM
Image Processing & Communications, vol. 22, no. 1, pp. 27-34 29
tools support syntax checking, synthesis and simple simu-
lations. For more complex projects, a more advanced sim-
ulation tool needs to be used. ModelSim made by Mentor
Graphics [7] can be used to solve this issue for the 1G
card, and ISim from Xilinx [8] for the 10G card. With
the use of these tools, a deep and advanced analysis of
the functionality of created chip is very convenient. Af-
ter verifying that the design operates correctly, it should
be synthesized. It is important to check as many things
as possible in the simulation, because the synthesis pro-
cess is very time-consuming. As a result of the synthesis
from HDL code, we obtain a *.bit file. The file may be
downloaded and run directly on the FPGA device.
A NetFPGA reference pipeline consists of queues as-
sociated with physical ports (4 physical ports -> 4 input
queues of ingoing frames and 4 output queues for outgo-
ing frames - marked green in Fig. 3). This structure is also
used for physical port projection on the PC side. These
ports can be detected by the operating system via a driver.
Their default names are: nf2c0,nf2c1,nf2c2,nf2c3 for the
1G card or nf0,nf1,nf2,nf3 for the 10G card.
An incoming frame is placed in a suitable queue
(RxQueues) and sent to the Input Arbiter module which
chooses one of the input queues, takes the packet and
sends it along the pipeline. Due to this fact, it is not cru-
cial through which port/queue the packet is coming be-
cause all packets are sent into a single pipeline. Then the
packets are processed in modules along the pipeline. At
the end, the module Output Port Lookup decides to which
output queue (TxQueue) the packet is to be sent.
A new project on NetFPGA can be started from scratch,
which means preparing all the code for processing pack-
ets, or one might only prepare one module and insert it
into a reference pipeline.
Reference NIC [4] (Network Interface Card) is an ex-
ample of a simple project. All frames are sent along the
pipeline and the Output Port Lookup is only changing the
output port according to the input port. For example, a
packet incoming to the first physical port is sent to the nf0
(for 10G card) logical PC port. And the frame incoming
to nf2 is going to be sent to the third physical port.
NetFPGA cards, especially the ones with VirtexV, may
also be used for more general purposes (not only for net-
working). In the FPGA chip, algorithms may be imple-
mented, which seems to be very complicated [913], but
their hardware realization (due to parallelization of some
operations) may be very efficient [1416].
3 Motivation
We used, analyzed and tested many existing reference
projects, we also prepared several our ones. We know
in some level of details NetFPGA cards and we are fa-
miliar with their internal structure (both, 1 and 10 Gbps).
We found that we can improve performance of data pro-
cessing in the main chip. We decided to introduce rela-
tively small changes in the manner of realization of the
main pipeline. Through this modification, we would like
to obtain the same throughput, but a smaller delay of Eth-
ernet frames. It can be said that the implementation of our
ideas introduces multi-core and parallel frame processing
in hardware chips. We will prepare a model that allows
us to simulate, investigate and compare the performance
of both versions and also (what is more important) we
will realize in practice and investigate a prototype of the
new version in our laboratory. We are going to measure
our prototype with professional industrial network analyz-
ers. We also plan to use the results of our investigation in
newer versions.
4 Current Structure
In reference projects, there is a singular main pipeline,
which is also presented in Fig. 4. When Input Arbiter
chooses frames to process them, they are sent to the next
modules one by one. Only after one module totally fin-
ishes processing a frame, the frame is sent to the next
Download Date | 5/20/18 2:20 AM
30 M. Michalski, T. Sielach
Input Arbiter
Output Queues
User Module
Output Port Lookup
Fig. 4: Actual structure for data processing in NetFPGA
module. In the worst case, all input queues can offer
frames at the same moment (this situation is presented in
Fig. 6). In such a situation, all frames will be processed,
but the last of them will have to wait until all previous
ones are be processed. Generally, typical processing of a
frame is realized in two logical phases, the first one for
the header of a frame, and the second one for the payload
(data of the frame). These phases are realized by differ-
ent parts of hardware. In such an architecture, when the
header is analyzed, the part of hardware for data trans-
mission is in idle mode; analogously, when the data of the
frame are transmitted, the hardware for the header is not
used. The consequences of this are presented in Fig. 7,
where all mentioned facts are visible very clearly.
Switching Fabric 8x8
Modified Control Module
Fig. 5: New structure for data processing in NetFPGA
5 Proposal of New Structure and Its
We are going to rebuild the parts responsible for header
analysis and data transmission. We would like not to
block the transmission for the time when the header is an-
alyzed, and vice versa. It is possible, but it requires quite a
modification of the pipeline. We propose to combine both
functionalities in one, more complex, module. We will
prepare a module that, just after analyzing the header of
one frame, takes the next frame for analysis, and the data
part of the just-analyzed frame is sent to transmission.
What is very important, in such a case, the transmission
would have to be realized by a more complicated struc-
ture than a single pipeline. We propose to use a switch-
ing fabric which can transmit several frames at the same
Download Date | 5/20/18 2:20 AM
Image Processing & Communications, vol. 22, no. 1, pp. 27-34 31
Time for header processing
Time for payload processing
Fig. 6: Four frames available at the same moment in input
Fig. 7: "Time table" for processing four frames in a row -
primary pipeline
time. When frames from different inputs of the switching
fabric (input queues) are directed to different outputs of
the switching fabric, they can be transmitted at the same
time without any problems. The idea of our proposal is
presented in Fig. 5. We are planning to investigate dif-
ferent types of switching fabrics. We will start from the
basic one that can be realized by multiplexers. We are not
planning to change the algorithm for the scheduler imple-
mented in the reference design.
We are certain that we will obtain a smaller delay of
Ethernet frames in nodes with the new structure, because
we will be able to serve several frames at the same time.
This situation is presented in Fig. 8. We are going to prove
Fig. 8: "Time table" for processing four frames in parallel
- modified pipeline
it in several ways. First, we were going to prepare a typ-
ical simulation on a PC. Also, we decided to prepare an
analogous simulation in hardware, i.e. to prepare a model
of the old and new structure in the FPGA chip dedicated
for simulations and realize this simulation in a very fast
way. After we find the parameters of both versions, we
will implement a prototype and compare real nodes. For
this analysis, we will use both our own and professional
and highly certificated network analyzers [2022]. At this
moment, we have prepared a hardware traffic generator
and modules for the analysis of the traffic served.
A switching fabric is dedicated for the transmission of
frames from its inputs to its outputs. It can simultane-
ously transmit several frames (that are not in conflict) at
the same time. It can be said that there is one hardware
block dedicated for transmission in each direction from
all possible input-output pairs. We can also say that with
such an approach, we obtain the parallelization of trans-
mission implemented by a multi-block structure.
6 Future Work and Conclusions
We know that we have proposed a big change in the work-
ing mechanism. But we believe we do not work for noth-
ing. We trust that even a small improvement of parameters
affecting the efficiency of nodes with a high throughput
can result in a great improvement in other places of the
whole network.
Works and plans described in this paper allow us to
modify the existing structure and obtain nodes with a bet-
ter performance.
We also plan to expand our equipment by a new version
of cards and implement our ideas and solution in them.
The work described in this paper was financed from the
funds of the Ministry of Science and Higher Education
for the year 2016.
Download Date | 5/20/18 2:20 AM
32 M. Michalski, T. Sielach
We would also like to thank (in alphabetical order):
AM Technologies Team [23] for lending out their
Network Analyzer with an interface with the speed
of 10Gbps;
NetFPGA Teams from Stanford University and
University of Cambridge [1] for their support and
help in organizing NetFPGA workshop in Poland
and overall work with NetFPGA cards;
Xilinx University Program [24] for donating to
the Poznan University of Technology five NetFPGA
cards with 10G interfaces.
[1] Website of community and NetFPGA project:
[2] Gibb G., Lockwood J.W., Naous J., Hartke P., McK-
eown N. (2008, August). NetFPGA: An Open Plat-
form for Teaching How to Build Gigabit-rate Net-
work Switches and Routers. In IEEE Transactions
on Education, Volume: 51, Issue: 3 pp. 364-369.
[3] Zilberman N., Audzevich Y., Covington G.A.,
Moore A.W. (2014, September). NetFPGA SUME:
Toward 100 Gbps as Research Commodity, IEEE
Micro, vol.34, no. 5, pp. 32-41.
[4] NetFPGA reference projects.
[5] Xilinx ISE 10.1 Quick Start Tutorial.www.xilinx.
[6] Xilinx Platform Studio.
[7] Modelsim.
[8] ISE In-Depth.
[9] Kabaci´
nski W., Michalski M. (2005, May). Wide-
sense Nonblocking Log2(N, 0, p)Switching Net-
works with Even Number of Stages. In Proc. IEEE
ICC 2005, Seoul, South Korea.
[10] Kabaci´
nski W., Michalski M. (2006, December).
The Routing Algorithm and Wide-Sense Nonblock-
ing Conditions for Multiplane Baseline Switching
Networks. In IEEE Journal on Selected Areas in
Communications, vol. 24, no. 12, pp. 35–44.
[11] Danilewicz G., Kabaci ´
nski W., Zal M., Michal-
ski M. (2008, April). A New Control Algo-
rithm for Wide-sense Nonblocking Multiplane Pho-
tonic Banyan-type Switching Fabrics with Zero
Crosstalk", In IEEE Journal on Selected Areas in
Communications vol 26, no. 3, part 2, pp. 54–64.
[12] Kabaci´
nski W., Kleban J., Michalski M., Zal M.,
Pattavina A., Maier G. (2009, June). Rearranging
Algorithms for Log2(N, 0, p)Switching Networks
with Even Number of Stages. International Work-
shop on High Performance Switching and Routing,
Paris, France.
[13] Kabaci´
nski W., Michalski M. (2011, June). The Al-
gorithm for Rearrangements in the Log2(N, 0, p)
Fabrics with Odd Number of Stages. In IEEE Inter-
national Conference on Communications ICC2011,
Kyoto, Japan.
[14] Kabaci´
nski W., Michalski M. (2010, June).
The FPGA Implementation of the Log2(N, 0, p)
Switching Fabric Control Algorithm. In IEEE Inter-
national Conference on High Performance Switch-
ing and Routing, Dallas, TX, USA.
[15] Kabaci´
nski W., Michalski M. (2011, July).
The FPGA Controller for the Rearrangeable
Download Date | 5/20/18 2:20 AM
Image Processing & Communications, vol. 22, no. 1, pp. 27-34 33
Log2(N, 0, p)Fabrics with an Even Number of
Stages. In IEEE International Conference on High
Performance Switching and Routing, HPSR 2011,
Cartagena, Spain.
[16] Kabaci´
nski W., Michalski M. (2013, June). The
Control Algorithm and the FPGA Controller
for Non-interruptive Rearrangeable Log2(N, 0, p)
Switching Networks. In IEEE International Con-
ference on Communications, ICC2013, Budapest,
[17] Michalski M. (2012, July). The Configurations for
Experimental Study of the Network Performance.
In 8th IEEE, IET International Symposium on Com-
munication Systems, Networks and Digital Signal
Processing (CSNDSP 2012).
[18] Michalski M. (2009, April). A Software and Hard-
ware System for a Fully Functional Remote Access
to Laboratory Networks". In The Fifth International
Conference on Networking and Services, Valencia,
Spain, pp 561-565.
[19] Michalski M. (2012, July). The Configurations for
Experimental Study of the Network Performance.
In 8th IEEE, IET International Symposium on Com-
munication Systems, Networks and Digital Signal
Processing (CSNDSP 2012).
[20] Michalski M. (2014, July). The System for Delay
Measurement in Ethernet Networks on NetFPGA
Cards, In IEEE International Conference on High
Performance Switching and Routing 2014, Vancou-
ver, Canada.
[21] Spirent Test Center.http://www.spirent.
[22] JDSU - MTS5800 traffic generator and analyzer.
[23] Oficial website of AM Technologies Company.
[24] Xilinx University Program.
Download Date | 5/20/18 2:20 AM
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
The demand-led growth of datacenter networks has meant that many constituent technologies are beyond the research community's budget. NetFPGA SUME is an FPGA-based PCI Express board with I/O capabilities for 100 Gbps operation as a network interface card, multiport switch, firewall, or test and measurement environment. NetFPGA SUME provides an accessible development environment that both reuses existing codebases and enables new designs
Conference Paper
Full-text available
This paper presents the control algorithm for non-interruptive rearrangements in log2(N, 0, p) switching networks. The proposed algorithm is able to find a plane for any new connection in the rearrangeable switching network with no more than three rearrangements. Moreover, these rearrangements can be realized in the switching network without interrupting transmission for existing connections. Rearrangements are done by setting up a new connecting path before the old connecting path for the rearranged connection is disconnected. We also describe the FPGA implementation of this algorithm for the log2(32, 0, 4) switching network. This FPGA controller has been tested and verified.
Conference Paper
Full-text available
In this paper, I will present a preliminary assumption for network performance tests to be conducted in the Computer Network Laboratory based on the NetFPGA cards and network equipment made by Cisco, Alcatel-Lucent and Juniper. First, the methodology of using the NetFPGA cards as measurement devices will be described, then, a number of tests for different configurations will be presented: the tests of typical network hardware (switches L2, L2/L3, routers) and routers on PCs with different OSs (Windows/Linux) and functionality (NAT, VPN) will be described. I will focus on a comparison of time statistics in different configurations with the same and different network traffic. All tests will be carried out using dedicated software prepared directly for these tests. They will show the time delay which caused by particular network equipments and network mechanisms.
Conference Paper
Full-text available
In this paper we consider the rearrangeable multi-plane banyan-type switching fabrics, called also log<sub>2</sub>(N, 0, p) switching networks, with even number of stages. For such networks different rearranging algorithms have been proposed for both: one-at-a-time and simultaneous connection models. In this paper we consider the one-at-a-time connection model, where connections arrive to the system one-by-one, and in case of blocking rearrangements are realized. To our knowledge, known algorithms require several rearrangements, and the number of such rearrangements have not been considered in the literature. We propose the new rearranging algorithm for the multi-plane banyan-type switching fabric composed of even number of stages. This algorithm leads to success using only one rearrangement. We also introduce the modified version of this new algorithm, in which rearrangement of an existing connecting path can be realized without its interruption.
Conference Paper
Full-text available
This paper presents a hardware implementation of a control algorithm for the log<sub>2</sub>(N, 0, p) switching fabric. This algorithm controls both connections and disconnections in the strict sense of a nonblocking switching fabric. The hardware implementation of this algorithm in Virtex5 circuits is described. The presented implementation has been optimized in order to minimize the time response of the controller. The controller is suitable to work in applications which require very fast (even immediate) decisions. Simulations were performed and the hardware implementation shows that the controller is able to determine a plane for a new connection in one clock cycle. After this clock cycle the controller is also ready for the next connection.
Conference Paper
Full-text available
This paper presents a software and hardware system which enables users to have a full access to a laboratory network composed of switches, routers, and workstations. The physical topology of this network can be dynamically reconfigurable according to current requirements of a remote user or users. Users can reserve access to the system via a Web site according to their profiles and rights granted by administrators. In order to configure a lab topology and its devices dedicated software is used.
Conference Paper
Full-text available
This paper presents a new way of representing the internal states of the log2(N,0,p) switching fabrics. Based on this representation a new algorithm for rearrangements in such fabrics was developed and is presented here. Also, the maximal number of rearrangements in the log2(N,0,p) switching fabrics with an odd number of stages is presented.
Conference Paper
Full-text available
In this paper we present rearrangeable log2(N, 0, p) switching fabrics and the control algorithm for the case of an even number of stages. The main topic of this paper is the implementation of a hardware controller for such fabrics. The algorithm is described in VHDL code and realized in ML505 - the demo board for Virtex 5 - FGPA chip from the Xilinx Company. The implementation presented here works very fast, the controller can send out the set of actual signals just 20 nanoseconds after the request has been made.
Conference Paper
Full-text available
In this paper the new control algorithm for multi-log<sub>2</sub>N switching networks is proposed. Wide-sense nonblocking conditions are derived and proved when this algorithm is used for connection set up in such kind of switching networks with even number of stages. It is shown, that under this algorithm and even the number of planes required for wide-sense non-blocking operation is the same as for the rearrangeable switching networks. To our knowledge this is the first switching network which achieves the same WSNB conditions as the rearrangeable one.
Conference Paper
In this paper the system for measuring delay in computer networks is presented. This system has several blocks, the main of which is the NetFPGA card with appropriate software running in its hardware chip programmed in HDL - Verilog. The others blocks are implemented in typical software (C, C++, C#), they realize the functionality of the management and the graphical user interface. The presented system makes it possible to measure the delay in switched and routed Ethernet networks in many different configurations. One of the most valuable features of this system is its cost, the second one is the resolution of the measured time periods. This system was developed for the NetFPGA cards with four 1Gbps ports, it has been also successfully implemented in NetFPGA cards with 10G interfaces. Both version have been compared with high class industrial network analyzers.