A Design Method of a Regular Expression Matching Circuit Based on Decomposed Automaton.
ABSTRACT This paper shows a design method for a regular expression matching
circuit based on a decomposed automaton. To implement a regular
expression matching circuit, first, we convert a regular expression into
a nondeterministic finite automaton (NFA). Then, to reduce the number
of states, we convert the NFA into a mergedstates nondeterministic
finite automaton with unbounded string transition (MNFAU) using a greedy
algorithm. Next, to realize it by a feasible amount of hardware, we
decompose the MNFAU into a deterministic finite automaton (DFA) and an
NFA. The DFA part is implemented by an offchip memory and a simple
sequencer, while the NFA part is implemented by a cascade of logic
cells. Also, in this paper, we show that the MNFAU based implementation
has lower area complexity than the DFA and the NFA based ones.
Experiments using regular expressions form SNORT shows that, as for the
embedded memory size per a character, the MNFAU is 17.17148.70 times
smaller than DFA methods. Also, as for the number of LCs (Logic Cells)
per a character, the MNFAU is 1.565.12 times smaller than NFA methods.
This paper describes detail of the MEMOCODE2010 HW/SW codesign contest
for which we won the first place award.
 Citations (15)
 Cited In (0)

Article: A new approach to text searching
[Show abstract] [Hide abstract]
ABSTRACT: We introduce a family of simple and fast algorithms for solving the classical string matching problem, string matching with don't care symbols and complement symbols, and multiple patterns. In addition we solve the same problems allowing up to k mismatches. Among the features of these algorithms are that they are real time algorithms, they don't need to buffer the input, and they are suitable to be implemented in hardware.ACM SIGIR Forum 01/1988; 23(SI):168175.  SourceAvailable from: Alfred V. Aho[Show abstract] [Hide abstract]
ABSTRACT: This paper describes a simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text. The algorithm consists of constructing a finite state pattern matching machine from the keywords and then using the pattern matching machine to process the text string in a single pass. Construction of the pattern matching machine takes time proportional to the sum of the lengths of the keywords. The number of state transitions made by the pattern matching machine in processing the text string is independent of the number of keywords. The algorithm has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.Commun. ACM. 01/1975; 18:333340.  SourceAvailable from: usc.edu[Show abstract] [Hide abstract]
ABSTRACT: We present an extensible automation framework for constructing and optimizing largescale regular expression matching (REM) circuits on FPGA. Paralleling the technique used by software compilers, we divide our framework into two parts: a frontend that parses each PCREformatted regular expression (regex) into a modular nondeterministic finite automaton (RENFA), followed by a backend that generates the REM circuit design for a multipipeline architecture. With such organization, various pattern and circuit level optimizations can be applied to the frontend and backend, respectively. The multipipeline architecture utilizes both logic slices and onchip BRAM for optimized character matching; in addition, it can be configured at compiletime to produce concurrent matching outputs from multiple RENFAs. Our framework prototype handles up to 64k "regular" regexes with arbitrary complexity and number of states, limited only by the hardware resources of the target device. Running on a commodity 2.3 GHz PC (AMD Opteron 1356), it takes less than a minute for the framework to convert ~1800 regexes used by the Snort IDS into RTLlevel designs with optimized logic and memory usage. Such an automation framework could be invaluable to REM systems to update regex definitions with minimal human intervention.International Conference on Field Programmable Logic and Applications, FPL 2010, August 31 2010  September 2, 2010, Milano, Italy; 01/2010
Page 1
364
IEICE TRANS. INF. & SYST., VOL.E95–D, NO.2 FEBRUARY 2012
PAPER
A Design Method of a Regular Expression Matching Circuit Based
on Decomposed Automaton
Special Section on Reconfigurable Systems
Hiroki NAKAHARA†a), Tsutomu SASAO†b), and Munehiro MATSUURA†c), Members
SUMMARY
matching circuit based on a decomposed automaton. To implement a reg
ular expression matching circuit, first, we convert a regular expression into
a nondeterministic finite automaton (NFA). Then, to reduce the number
of states, we convert the NFA into a mergedstates nondeterministic fi
nite automaton with unbounded string transition (MNFAU) using a greedy
algorithm. Next, to realize it by a feasible amount of hardware, we de
compose the MNFAU into a deterministic finite automaton (DFA) and an
NFA. The DFA part is implemented by an offchip memory and a simple
sequencer, while the NFA part is implemented by a cascade of logic cells.
Also, in this paper, we show that the MNFAU based implementation has
lower area complexity than the DFA and the NFA based ones. Experiments
using regular expressions form SNORT shows that, as for the embedded
memory size per a character, the MNFAU is 17.17148.70 times smaller
than DFA methods. Also, as for the number of LCs (Logic Cells) per a
character, the MNFAU is 1.565.12 times smaller than NFA methods. This
paper describes detail of the MEMOCODE2010 HW/SW codesign contest
for which we won the first place award.
key words: regular expression, NFA, DFA, MNFAU, FPGA
This paper shows a design method for a regular expression
1.Introduction
1.1Regular Expression Matching for Network Applica
tions
A regular expression represents a set of strings. Regular
expression matching detects a pattern represented by a reg
ular expression. Various network applications (e.g., intru
sion detection systems[8],[20], a spam filter[21], a virus
scanning system[6], and an L7 filter[11]) use regular ex
pression matching. Regular expression matching spends a
major part of the total computation time for these appli
cations. The throughput using the perl compatible regular
expressions (PCRE)[17] on a general purpose MPU is at
most hundreds of Mega bits per second(Mbps)[18], which
is too slow. Thus, hardware regular expression matching is
required. For network applications, since the highmix low
volume production and the flexible support for new proto
cols are required, FPGAs are widely used. Recently, ded
icated highspeed transceivers for the highspeed network
are embedded in FPGAs. So, we expect extensive use of
FPGAs in the future.
Manuscript received April 22, 2011.
Manuscript revised September 2, 2011.
†The authors are with the Department of Computer Science
and Electronics, Kyushu Institute of Technology, Iizukashi, 820–
8502 Japan.
a)Email: nakahara@aries01.cse.kyutech.ac.jp
b)Email: sasao@cse.kyutech.ac.jp
c)Email: matsuura@cse.kyutech.ac.jp
DOI: 10.1587/transinf.E95.D.364
Different users require systems with different perfor
mance and price. Thus, different architectures should be
used. For the IXPs (Internet eXchange Points) and the ISPs
(Internet Service Providers), since extremely high through
put (e.g., more than tens of Giga bits per second (Gbps))
is required, such systems tend to have high cost. How
ever, for lowend users, such as SOHO (small office and
home office), low cost systems are necessary. The Xilinx
FPGA consists of a logic cell (LC) and an embedded mem
ory (BRAM)∗[22]. For the Xilinx Spartan III FPGA, the
LC consists of a four input lookup table (LUT) and a flip
flop (FF). Since the cost for the FPGA is proportional to
the number of LCs, reduction of the number of LCs means
reduction of the system cost. In this paper, we propose a de
sign method of the regular expression matching circuit with
fewer LCs than conventional methods.
1.2Proposed Method
The conventional NFA based method uses singlecharacter
transitions[19]. In the circuit, each state for the NFA is
implemented by an LC. Although a modern FPGA con
sists of LCs and embedded memories, the conventional
NFA based method fails to use available embedded mem
ories (Fig.1(a)).In contrast, our previous method uses
both LCs and embedded memory to implement the de
composed NFA with string (multicharacter) transition[14].
Thus, this method requires fewer LCs than the conventional
Fig.1
Platforms for regular expression matching circuits.
∗In addition, it consists of a DSP (multiply and accumulation)
block, a PLL, a DLL, and a Power PC (embedded processor).
Copyright c ? 2012 The Institute of Electronics, Information and Communication Engineers
Page 2
NAKAHARA et al.: A DESIGN METHOD OF A REGULAR EXPRESSION MATCHING CIRCUIT BASED ON DECOMPOSED AUTOMATON
365
method (Fig.1(b)). Moreover, in this paper, to further re
duce the FPGA cost, we use the offchip SRAM to imple
ment most of the regular expression matching circuit. Since
the cost for the offchip SRAM is much lower than for the
FPGA, the total cost using an FPGA and the offchip SRAM
is also small. Since our method reduces the FPGA resource
drastically, it also reduces the system cost (Fig.1(c)).
1.3Analysis of Complexities of Finite Automata (FAs) on
Parallel Hardware Model
Yu et al.[25] compared complexities of the NFA with the
DFA on a random access machine (RAM) model. How
ever, to our knowledge, complexities of FAs on the parallel
hardware model has not been reported. In this paper, we
compare the nondeterministic finite automaton (NFA),
the deterministic finite automaton (DFA), and the decom
posed NFA with string transition on the parallel hardware
model. The decomposed NFA is much smaller than conven
tional methods.
1.4Related Work
Regular expressions are detected by finite automata. In a
DFA, for each state and each input, there is a unique transi
tion, while in a NFA, for each state for each input, multiple
transitions may exist. In an NFA, there exist εtransitions
to other states without consuming input characters. Various
DFAbased regular expression matchings exist: An Aho
Corasick algorithm[1]; a bitpartition of the AhoCorasick
DFA by Tan et al.[23]; a combination of the bitpartitioned
DFA and the MPU[3]; and a pipelined DFA[5]. Also, vari
ous NFAbased regular expression matchings exist: an algo
rithm that emulates the NFA (BaezaYates’s NFA) by shift
and AND operations on a computer[2]; an FPGA realiza
tion of BaezaYates’s NFA (SidhuPrasanna method)[19];
prefix sharing of regular expressions[12]; and a method that
maps repeated parts of regular expressions to the Xilinx
FPGA primitive (SRL16)[4].
1.5Organization of the Paper
The rest of the paper is organized as follows: Section 2
shows a regular expression matching circuit based on the fi
niteautomaton; Section3showsaregularexpressionmatch
ing circuit based on an NFA with string transition; Section 4
shows a design method of a regular expression matching
circuit based on an NFA with string transition; Section 5
compares complexities on the parallel hardware model; Sec
tion 6 shows the experimental results; Section 7 shows the
result of MEMOCODE 2010 HW/SW codesign contest;
and Sect.8 concludes the paper.
This paper is an extension of previous publica
tions[13]–[15].
2.Regular Expression Matching Circuit Based on Au
tomaton
2.1Regular Expression
A regular expression consists of characters and meta charac
ters. A character is represented by eight bits. The length of
the regular expression is the number of characters. Table 1
shows meta characters considered in this paper. Note that,
in Table 1, r denotes a regular expression.
2.2Regular Expression Matching Circuit Based on Deter
ministic Finite Automaton
Definition 2.1: A deterministic finite automaton (DFA)
consists of a fivetuple MDFA = (S,Σ,δ, s0,A), where S =
{s0, s1,..., sq−1} is a finite set of states; Σ is a finite set of
input characters; δ is a transition function (δ : S × Σ → S);
s0 ∈ S is the initial state; and A ⊆ S is the set of accept
states. Since our system accommodates ASCII characters, it
is convenient to choose Σ = 28= 256.
Definition 2.2: Let s ∈ S, and c ∈ Σ. If δ(s,c) ∈ S, then c
denotes a transition character from state s to state δ(s,c).
To define a transition string accepted by the DFA, we
extend the transition function δ toˆδ.
Definition 2.3: Let Σ+be a set of strings, andˆδ : S ×Σ+→
S be the extended transition function. If C ⊆ Σ+and s ∈ S,
thenˆδ(s,C) represents a transition state of s with respect to
the input string C.
Definition 2.4: Suppose that MDFA = (S,Σ,δ, s0,A). Let
Cin⊆ Σ+. Then, MDFAaccepts a string Cin, if the following
relation holds:
ˆδ(s0,Cin) ∈ A.
Let cibe a character of a stringC = c0c1···cn, and δ be
a transition function. Then, the extended transition function
ˆδ is defined recursively as follows:
ˆδ(s,C) =ˆδ(δ(s,c0),c1c2···cn).
From (1) and (2), the DFA performs the string match
ing by repeating state transitions.
(1)
(2)
Table 1
ered in this paper.
Regular Expression
r1r2
r1 r2
r*
r+
r?
r{n,m}
Meta characters for perl compatible regular expression consid
Meaning
concatenation (r1followed by r2)
r1or r2(union)
repeat r zero or more times (Kleene closure)
repeat r one or more times
repeat r zero or one time
repeat r at least n and at most m times
match any single character except newline (\n)
set of characters
complement set of characters
matching start from the first character
matching ends at the last character
.
[]
[ˆ]
ˆ
$
Page 3
366
IEICE TRANS. INF. & SYST., VOL.E95–D, NO.2 FEBRUARY 2012
Fig.2
DFA for the regular expression “A+[AB]{3}D”.
Fig.3
DFA machine.
Example 2.1: Figure 2 shows the DFA for the regular
expression “A+[AB]{3}D”. Note that, “A+” denotes one
or more “A”, and “[AB]{3}=[AB][AB][AB]” denotes three
times occurrences of “A” or “B”.
Example 2.2: Consider the string matching for an input
“AABAD” using the DFA shown in Fig.2.
[AB] denotes “A” or “B”, while “AB” denotes the con
catenation “A” and “B”. [AB][CD] denotes the set of four
strings {AC,AD,BC,BD}. Let s0be the initial state. First,
δ(s0,A) = s1. Second, δ(s1,A) = s2. Third, δ(s2, B) = s5.
Fourth, δ(s5,A) = s9. Finally, δ(s9,D) = s11. Since the state
s11is an accept state, the string “AABAD” is accepted.
Note that,
Figure 3 shows the DFA machine, where the register
stores the present state and the memory realizes the transi
tion function δ. Let q = S be the number of states, and n =
Σ be the number of characters in Σ. Then, the amount of
memorytoimplementtheDFAis?log2q?2?log2n?+?log2q?bits†.
2.3Regular Expression Matching Circuit Based on Non
deterministic Finite Automaton
Definition 2.5: A
(NFA) consists of a fivetuple MNFA= (S,Σ,γ, s0,A), where
S, Σ, s0, and A are the same as Definition 2.1, while the tran
sition function γ : S × (Σ ∪ {ε}) → P(S) is different. Note
that, ε denotes an empty character, and P(S) denotes the
power set of S.
nondeterministicfiniteautomaton
In the NFA, the empty (ε) input is permitted. Thus, a
state for the NFA can transit to multiple states. The state
transition with the ε input denotes an ε transition. In this
paper, in a state transition diagram, an ε symbol with an
arrow denotes the ε transition.
Example 2.3: Figure 4 shows the NFA for the regular ex
Fig.4
NFA for the regular expression “A+[AB]{3}D”.
Fig.5
A circuit for the NFA shown in Fig.4.
pression for “A+[AB]{3}D”, and also shows the states vis
ited when the input string is “AABAD”. Note that, multiple
state transitions occur in certain rows, since the NFA can be
in multiple states given input string “AABAD”. There is at
least one path from the initial state s0to the accept state s5.
Thus, “AABAD” is accepted by this NFA.
Sidhu and Prasanna[19] realized an NFA with single
character transitions for regular expressions[2]. Figure 5
shows the circuit for the NFA. To realize the NFA, first, the
memory detects the character for the state transition, and
then character detection signals are sent to small machines
that correspond to states of the NFA. Each small machine
is realized by a flipflop and an AND gate. Also, an ε
transition is realized by OR gates and interconnections on
the FPGA. Then, machines for the accepted states generate
the match signal.
3.Regular Expression Matching Circuit Based on NFA
with String Transition
3.1MNFAU
SidhuPrasanna’s method[19] does not useembedded mem
†Since the size of the register in the DFA machine is much
smaller than that for the memory storing the transition function,
we ignore the size of the register.
Page 4
NAKAHARA et al.: A DESIGN METHOD OF A REGULAR EXPRESSION MATCHING CIRCUIT BASED ON DECOMPOSED AUTOMATON
367
ory†. So, their method is inefficient with respect to the re
source utilization of an FPGA, since a modern FPGA con
sists of LCs and embedded memories. In the circuit for
the NFA, each state is implemented by an LC of an FPGA.
Thus, the necessary number of LCs increases with the num
ber of states. To reduce the number of states, we propose
a regular expression matching circuit based on a merged
states nondeterministic finite automaton with unbounded
string transition (MNFAU). To convert an NFA into an MN
FAU, we merge a sequence of states. However, to retain the
equivalence between the NFA and the MNFAU, we merge
the states as follows:
Lemma 3.1: Let S
states for the NFA.
{sk, sk+1,..., sk+p−1} ⊆ S, where k ≤ i ≤ k + p − 2, sigoes
to si+1only. Then, the states in S?are merged into one state
of the MNFAU only if both the indegree and the outdegree
for si(k ≤ i ≤ k + p − 1) are one.
Definition 3.1: Suppose that a set of states {sk, sk+1,...,
sk+p} of an NFA is merged into a state SM of an MN
FAU. A string C = ckck+1···ck+pis a transition string
of SM, where cj ∈ Σ is a transition character of sj for
j = k,k + 1,...,k + p.
=
{s0, s1,..., sq−1} be the set of
Assume that for a subset S?
=
Example 3.1: In the NFA shown in Fig.4, the set of states
{s2, s3, s4, s5} can be merged into a state of the MNFAU.
However, the set of states {s1, s2} cannot be merged, since
e1? 0.
Example 3.2: Figure 6 shows possible MNFAUs derived
from the NFA shown in Fig.4. In the NFA, since three states
{s2, s3, s4} have ei = 0, the number of possible MNFAUs
are eight. In Fig.6, the MNFAU (h) is the most compact
MNFAU.
As shown in Example 3.2, the conversion of the com
pact MNFAU from the given NFA exist. However, we must
consider the restriction for the hardware. The next Section
shows the hardware realization for the MNFAU, and Sect.4
Fig.6
Possible MNFAUs derived from the NFA shown in Fig.4.
shows the design method for the MNFAU.
3.2Realization of MNFAU
An MNFAU is decomposed into a DFA and an NFA. The
DFA is realized by the transition string detection circuit, and
the NFA is realized by the state transition circuit. Figure 7
shows a decomposed MNFAU. Since transition strings do
not include meta characters††, they are detected by exact
matching. Exact matching is a subclass of regular expres
sion matching and the DFA can be realized by a feasible
amount of hardware[25]. On the other hand, the state tran
sition part treating the ε transition is implemented by the
cascade of logic cells shown in Fig.5.
3.2.1Transition String Detection Circuit
Since each state of the MNFAU consists of different num
ber of states of the NFA, lengths of the transition strings
for states of the MNFAU are different. To detect multi
ple strings with different lengths, we use the AhoCorasick
DFA (ACDFA)[1]. To obtain the ACDFA, first, the tran
sition strings are represented by a text tree (Trie). Next, the
failure paths that indicate the transitions for the mismatches
are attached to the text tree. Since the ACDFA stores fail
ure paths, no backtracking is required. By scanning the in
put only once, the ACDFA can detect all the strings repre
sented by the regular expressions. The ACDFA is realized
by the circuit shown in Fig.3. Let q = S be the num
ber of states, and n = Σ be the number of characters in Σ.
Then, the amount of memory to implement the ACDFA is
?log2q?2?log2n?+?log2q?bits.
Example 3.3: Figure 8 illustrates the ACDFA accepting
transition strings “A” and “[AB][AB][AB]D” for the MN
FAU shown in Fig.6.
3.2.2State Transition Circuit[15]
In an NFA, each state is realized by the small machine con
sisting of a flipflop and an AND gate. Figure 9 shows the
state transition circuit for the MNFAU. When the ACDFA
detects the transition string (“ABD” in Fig.9), a detection
signal is sent to the state transition circuit.Then, the state
Fig.7
Decomposed MNFAU.
†Their method uses single character detectors (comparators)
instead of the memory shown in Fig.5.
††However, a meta character “[]” can be used.
Page 5
368
IEICE TRANS. INF. & SYST., VOL.E95–D, NO.2 FEBRUARY 2012
Fig.8
ACDFA accepting strings “A” and “[AB][AB][AB]D”.
Fig.9
State transition circuit for the MNFAU.
Fig.10
Two LUT modes for Xilinx FPGA.
transition is performed. The ACDFA scans a character in
every clock, while the state transition requires p clocks to
perform the state transition, where p denotes the length of
the transition string. Thus, a (p − 1)bit shift register is in
serted between small machines to synchronize with the AC
DFA (In Fig.9, a twobit shift register is inserted). A four
input LUT of a Xilinx FPGA can also be used as a shift reg
ister with up to 16bits (SRL16)[24]. Figure 10 shows two
LUT modes of a Xilinx FPGA†. With the SRL16, we can
Fig.11
An example circuit for the decomposed MNFAU.
reduce the necessary number of LUTs and flipflops.
Figure11showsanexamplecircuitforthedecomposed
MNFAU. We decompose the MNFAU into the transition
string detection circuit and the state transition circuit. The
transition function for the ACDFA is realized by the off
chip memory (i.e., SRAM), while other parts are realized
by the FPGA. In the ACDFA, a register with ?log2q?bits
shows the present state, where q is the number of states for
the ACDFA. On the other hand, a ubit detection signal is
necessary for the state transition circuit, where u is the num
ber of states for the MNFAU. We use a decoder that converts
a ?log2q?bit state to a ubit detection signal. Since the de
coder is relatively small, it is implemented by the embedded
memory††in the FPGA.
Example 3.4: In Fig.11, the address for the decoder mem
ory corresponds to the assigned state number for the AC
DFA shown in Fig.8. The decoder memory produces the de
tection signal for the state transition circuit. As for the NFA
based regular expression matching circuit shown in Fig.5,
the number of LUTs is five, and the number of FFs is five.
On the other hand, as for the MNFAU based regular expres
sion matching circuit shown in Fig.11, the number of LUTs
is three, and the number of FFs is two.
4.Design of a Decomposed Regular Expression Match
ing Circuit
4.1Design Flow
This section shows the design method for the decomposed
regular expression matching circuit. Figure 12 shows the
design flow. First, the NFA is constructed from the given
†In the Xilinx Spartan III FPGA, a CLB consists of four
SLICEs, and a SLICE consists of two LCs. In the CLB, two
SLICEs can be configured as an SRL16 or an LUT, while other
two SLICEs can be configured as an LUT only.
††It can also be implemented by LUTs. However, when q is
large, it requires a large number of LUTs. In our experiment for
the SNORT, q = 10,066.
Page 6
NAKAHARA et al.: A DESIGN METHOD OF A REGULAR EXPRESSION MATCHING CIRCUIT BASED ON DECOMPOSED AUTOMATON
369
Fig.12
Design flow for regular expression matching circuit.
regular expression. Then, it is converted into an MNFAU.
The conversion method is described in Sect.4.3. Next, the
MNFAU is decomposed into the DFA part and the NFA part.
The DFA part is realized by the sequencer shown in Fig.3
and a decoder, while the NFA part is realized by the cascade
of LCs shown in Fig.9. Then, the decomposed MNFAU
is converted into the HDL source file. Finally, we use the
XilinxISEDesignSuite, anFPGAsynthesistooltogenerate
the configuration data for the FPGA.
4.2Construction of the NFA
The regular expression shown in Table 1 satisfies the follow
ing relations:
r+ = rr ∗
r? = (ε  r)
r{n,m} =
[cicj] = ci cj,
where r denotes a regular expression, ci∈ Σ denotes a char
acter, Σ denotes the set of characters, ε denotes an empty
character, and ci,cj denotes distinct characters. Thus, to
constructanNFAfromaregularexpression, itissufficientto
consider the “state transition with a character”, “concatena
tion”, “Kleene closure (*)”, and “union ()”. To construct the
NFA from the given regular expression, we use the modified
McNaughtonYamada construction[9]. Figure 13 shows the
modified McNaughtonYamada construction.
n
????
r···r 
n+1
????
r···r  ··· 
m
????
r···r
4.3 Design Algorithm for the Decomposed MNFAU
As shown in Example 3.2, any NFA can be converted into an
MNFAUsatisfyingAlgorithm3.1. Thus, theconversioninto
a compact MNFAU is important. The conversion problem to
an MNFAU from an NFA is formulated as follows:
Problem 4.1: Let S = {s0, s1,..., sq−1} be the set of states
Fig.13
Modified McNaughtonYamada constructions.
of the NFA; t be the number of states for the NFA with
ei> 0 (0 ≤ i ≤ q−1), where eibe the total number of ε tran
sition inputs and outputs in the state si; (S1,S2,...,Su) be
a partition of S, where Si⊆ S and Si∩ Sj= φ(i ? j); Cibe
a transition string for a set of states Si; C = {C1,C2,...,Cu}
be a set of transition strings; M(C) be the memory size of
the ACDFA forC; and Mof f−chipbe the memory size for the
offchip memory. Then, find a partition S that minimizes u
satisfying the memory constraint M(C) < Mof f−chip, where
u is the number of partitions in S.
Si= {sk, sk+1,..., sk+p}, ei= 0 for i = k,k+1,···,k+ p−1.
Since the number of possible MNFAUs 2q−t−1can be
very large, an exhaustive method to find a minimum MN
FAU satisfying the offchip memory constraint M(C) <
Mof f−chipis impractical. In this paper, we propose a greedy
method to find a near minimum MNFAU.
Note that, for each
Algorithm 4.1: (Find a near minimum MNFAU from the
NFA)
Let S = {s0, s1,..., sq−1} be a set of states for the NFA, and
Mof f−chipbe the memory size for the offchip memory.
1. Obtain a minimum partition S = S1∪ S2∪ ··· ∪ Su,
where Si∩ Sj = φ(i ? j), such that, for each Si =
{sk, sk+1,..., sk+p}, ei = 0 for i = k,k + 1,···,k +
p − 1. Then, obtain a set of transition strings C =
{C1,C2,...,Cu}.
2. Construct the ACDFA for C. Then, obtain M(C).
3. If M(C) ≤ Mof f−chip, then go to Step 6.
4. Select the maximum Si
S, and partition it into two subsets Si 1 and Si 2,
where Si 1
=
{sk, sk+1,..., sk+?n
{sk+?n
strings C = {C1,C2,...,Ci 1,Ci 2,...,Cu}, where Ci 1
is a transition string for Si 1, and Ci 2is that for Si 2.
5. Go to Step 2.
6. Terminate the algorithm.
= {sk, sk+1,... sk+n} from
2?} and Si 2
=
2?+1..., sk+n}.Also, obtain a set of transition
Algorithm4.1partitionsthemaximumsubsetofS until
the memory size M(C) does not exceed Mof f−chip.
5.Complexity of Regular Expression Matching Circuit
on Parallel Hardware Model
The Xilinx FPGA consists of logic cells(LCs) and embed
Page 7
370
IEICE TRANS. INF. & SYST., VOL.E95–D, NO.2 FEBRUARY 2012
ded memories. An LC consists of a four input lookup ta
ble(LUT) and a flipflop (FF)[22]. Therefore, as for the
area complexity, we consider both the LC complexity and
the embedded memory complexity.
5.1Theoretical Analysis
5.1.1AhoCorasick DFA
As shown in Fig.3, a machine for the DFA consists of a
register storing the present state, and the memory for the
state transition. The DFA machine reads one character and
computes the next state in every clock. Thus, the time com
plexity is O(1). Also, since the size of the register is fixed,
the LC complexity is O(1). Yu et al.[25] showed that, for m
regular expressions with length s, the memory complexity
is O(Σsm) for the AhoCorasick DFA, where Σ denotes the
number of characters in Σ.
5.1.2 BaezaYates NFA
As shown in Fig.5, an NFA consists of the memory for the
transition character detection, and a cascade of LCs each of
which consists of an LUT (realizing AND and OR gates)
and a FF. Thus, for m regular expressions with length s, the
LC complexity is O(ms). Since the amount of memory for
the transition character detection is m × Σ × s, the memory
complexity is O(ms). A regular expression matching circuit
based on an NFA has s states and processes one character
every clock, including ε transitions. By using m circuits
shown in Fig.5, the circuit can match m regular expressions
in parallel. Thus, the time complexity is O(1).
5.1.3Decomposed MNFAU
As shown in Fig.7, the decomposed MNFAU consists of a
transition string detection circuit and a state transition cir
cuit. The transition string detection circuit is realized by
the DFA machine shown in Fig.3. Let pmaxbe the maxi
mum length of the transition string in the MNFAU, and Σ
be the number of characters in the set Σ. From the anal
ysis of the DFA[7], the memory complexity is O(Σpmax),
while the LC complexity for the ACDFA machine is O(1).
The state transition circuit is realized by the cascade of LCs
shown in Fig.11. Let pavebe the average number of merged
states in the NFA, s be the length of the regular expression,
and m be the number of regular expressions. Since one state
in the MNFAU corresponds to pavestates in the NFA, the
LC complexity is O(ms
circuit matches m regular expressions in parallel. Thus, the
time complexity is O(1).
Note that, in most cases, the NFA requires longer word
length than the MNFAU. The NFA requires smbit words,
while the MNFAU requires ?log2q?bit words†, where q is
the number of states for the MNFAU. For the NFA, off
chip memories are hard to use, since the FPGA has a limited
number of pins. Thus, the NFA requires a large number of
pave). By using m parallel circuits, the
Table 2
FAU on the parallel hardware mode.
Complexities for the NFA, the DFA, and the decomposed MN
Time Area
Memory
O(ms)
O(Σms)
O(Σpmax)
#LC
O(ms)
O(1)
O(ms
pave)
BaezaYates’s NFA
AhoCorasick DFA
Decomposed MNFAU
O(1)
O(1)
O(1)
Fig.14
ber of LCs.
Relation between the length s of regular expression and the num
onchip memories. On the other hand, for the MNFAU, off
chip memory is easy to use, since the required number of
pins is small. Although the MNFAU requires larger memory
than the NFA, the MNFAU can use offchip memory and a
small FPGA. This reduces the hardware cost.
Table2comparestheareaandtimecomplexitiesforthe
NFA, the DFA, and the decomposed MNFAU on the parallel
hardware model. As shown in Table 2, by using the decom
posed MNFAU, the memory size is reduced to
the DFA, and the number of LCs is reduced to
NFA.
1
Σms−pmaxof
1
paveof the
5.2Analysis Using SNORT
To verify the analysis of the previous part, we compared the
memory size and the number of LCs for practical regular ex
pressions. We selected 80 regular expressions from the in
trusion detection system SNORT[20], and for each regular
expression, we generated the DFA, the NFA, and the decom
posed MNFAU. Then, we obtained the number of LCs and
the memory size. Figure 14 shows the relation between the
length of the regular expression s and the number of LCs,
while Fig.15 shows the relation between s and the mem
ory size. Note that, Fig.14, has a linear vertical axis, while
Fig.15 has a logarithmic vertical axis. As shown in Fig.14,
the ratio between the number of LCs and s is a constant. On
the other hand, as shown in Fig.15, the ratio between the
memory size and s increases exponentially.
Therefore, both the theoretical analysis and the experi
ment using SNORT show that the decomposed MNFAU re
alizes regular expressions efficiently.
†For example, in the SNORT, the value of sm is about 100,000,
while ?log2q? = 14.
Page 8
NAKAHARA et al.: A DESIGN METHOD OF A REGULAR EXPRESSION MATCHING CIRCUIT BASED ON DECOMPOSED AUTOMATON
371
Table 3
Comparison with other methods.
FAFPGA
Type
DFAVirtex 2
DFAVirtex 4
NFA Virtex 4
MNFA(p)Virtex 6
MNFAU Spartan 3
MethodTh#LC MEM
(Kbits)
3,456
6,000
#Char#LC/
#Char
22.22
N/A
1.28
0.39
0.25
MEM/
#Char
3182.2
367.5
(Gbps)
Pipelined DFA[5] (ISCA’06)
MPU+Bitpartitioned DFA[3] (FPL’06)
Improvement of SidhuPrasanna method[4] (FPT’06)
MNFA(3)[14] (SASIMI’10)
MNFAU (Proposed method)
4.0
1.4
2.9
3.2
1.6
247,000
N/A
25,074
4,707
19,552
11,126
16,715
19,580
12,095
75,633
00
441
1,585
37.3
21.4
Table 4
Number of
Regular Expressions
Result of MEMOCODE2010 HW/SW codesign contest.
Performance
(Mbps)
140 798FPGA: Altera Stratix III
140 500FPGA: Xilinx V5LX330
126500FPGA: Xilinx ML505
85 734FPGA: Xilinx XUPV5
35524GPU: NVIDIA Tesla T10
25584FPGA: Xilinx XUPV5
25 524GPU: NVIDIA GTX 295
42 534FPGA: Altera Stratix III
Team NamePlacePlatformInstitution
Sasao Lab (Our team)
Limenators
SpbSU
Kraaken
Battery
Team IISC
Tosan
[Ii][Ss][Uu][02]{4}
1 (tie)
1 (tie)
Kyushu Institute of Technology, Japan
IBM Research, USA
LanitTercom, Russia
AMD, USA
Iowa State University, USA
Indian Institute of Science, India
Sharif University of Technology, Iran
Iowa State University, USA
3
4
5
6
7
8
Fig.15
memory size.
Relation between the length s of regular expression and the
6.Experimental Results
6.1 Implementation of the MNFAU
We selected regular expressions from SNORT (opensource
intrusion detection system), and generated the decomposed
MNFAU. Then, we implemented these on the Xilinx Spar
tan III FPGA (XC3S4000: 62,208 logic cells(LCs), total
1,728Kbits BRAM). The total number of regular expres
sions is 1,114 (75,633 characters). The number of states for
the MNFAU is 12,673, and the number of states for the AC
DFA for the transition string is 10,066. This implementation
requires 19,552LCs, and an offchip memory of 16Mbits.
Note that, the 16Mbits offchip SRAM is used to store the
transition function of the ACDFA, while 1,585Kbits on
chip BRAM is used to realize the decoder. The FPGA op
erates at 271.2MHz. However due to the limitation on the
clock frequency by the offchip SRAM, the system clock
was set to 200MHz. Our regular expression matching cir
cuitscansonecharacterineveryclock. Thus, thethroughput
is 0.2 × 8 = 1.6Gbps.
6.2Comparison with Other Methods
Table 3 compares our method with other methods. In Ta
ble 3, Th denotes the throughput(Gbps); #LC denotes the
number of logic cells; MEM denotes the amount of embed
ded memory for the FPGA(Kbits); and #Char denotes the
number of characters for the regular expression. Table 3
shows that, as for the embedded memory size per a charac
ter, the MNFAU requires 17.17148.70 times smaller mem
ory than the DFA method. Also, as for the number of LCs
per a character, the MNFAU requires 1.565.12 times fewer
LCs than the NFA method.
7.Result of Eighth MEMOCODE2010 HW/SW Co
design Contest[16]
In July 2010, the eighth ACM/IEEE international confer
ence on formal methods and models for codesign (MEM
OCODE2010) challenged teams to implement the architec
ture for an unique type of a deep packet inspector called
CANSCID (Combined Architecture for Stream Categoriza
tion and Intrusion Detection). Metrics judging the design
are:
1. The number of category patterns and the intrusion pat
terns represented by regular expressions.
2. Thesystemthroughputmustbehigherthanthelinerate
of 500Mbps.
We implemented CANSCID on a Terasic Technolo
gies Inc. DE3 development board utilizing an Altera
Stratix III FPGA (EP3S340H1152C3N4).
140 regular expression patterns of the design contest using
MNFA (3)†[15]. Table 4 shows the result of design con
We realized
†MNFA(3) is aspecialcase of MNFAU whosetransitionstring
has at most three characters[14]. After the design contest, we gen
eralized MNFA (3) to the MNFAU. Although the MNFA (3) is
easy to generate, it requires more hardware than the MNFAU.
Page 9
372
IEICE TRANS. INF. & SYST., VOL.E95–D, NO.2 FEBRUARY 2012
test. Team Sasao Lab (our team) and Limenators were joint
winners, each implementing 140 patterns while maintaining
a line rate of 500Mbps. Only our team used the MNFA (3)
approach ratherthanDFAsfortheregular expressionmatch
ing. In this way, we could implement regular expressions
compactly while maintaining the highest speed.
8.Conclusion
In this paper, we proposed a regular expression matching
circuit based on a decomposed MNFAU. To implement the
circuit, first, we converted the regular expressions into an
NFA. Then, to reduce the number of states, we converted
the NFA into an MNFAU by a greedy method. Next, to
realize it by a feasible amount of the hardware, we decom
posed the MNFAU into a transition string detection part and
a state transition part. The transition string detection part
was implemented by an offchip memory and a simple se
quencer, while the state transition part was implemented by
a cascade of logic cells. Also, this paper showed that the
MNFAU based implementation has lower area complexity
than the DFA and the NFA based ones. The implementation
of SNORT showed that, as for the embedded memory size
per a character, the MNFAU is 17.17148.70 times smaller
than DFA methods. Also, as for the number of LCs per a
character, the MNFAU is 1.565.12 times smaller than NFA
methods. With MNFA (3), we won the first place award in
the MEMOCODE2010 HW/SW codesign contest.
Acknowledgments
This research is supported in part by the grant of Regional
Innovation Cluster Program (Global Type, 2nd Stage). Dis
cussion with Prof. J. T. Butler was quite useful. Dr. Hiroaki
Yoshida encouraged us to participate the design contest.
The hard work of the organizers of the MEMOCODE2010
HW/SW codesign contest is also appreciated.
References
[1] A.V. Aho and M.J. Corasick, “Efficient string matching: An aid to
bibliographic search,” Commun. ACM, vol.18, no.6, pp.333–340,
1975.
[2] R. BaezaYates and G.H. Gonnet, “A new approach to text search
ing,” Commun. ACM, vol.35, no.10, pp.74–82, Oct. 1992.
[3] Z.K. Baker, H. Jung, and V.K. Prasanna, “Regular expression soft
ware deceleration for intrusion detection systems,” FPL’06, pp.28–
30, 2006.
[4] J. Bispo, I. Sourdis, J.M.P. Cardoso, and S. Vassiliadis, “Regular
expression matching for reconfigurable packet inspection,” FPT’06,
pp.119–126, 2006.
[5] B.C. Brodie, D.E. Taylor, and R.K. Cytron, “A scalable architecture
for highthroughput regularexpression pattern matching,” ISCA’06,
pp.191–202, 2006.
[6] “Clam anti virus: open source antivirus toolkit,”
http://www.clamav.net/lang/en/
[7] R. Dixon, O. Egecioglu, and T. Sherwood, “Automatatheoretic
analysis of bitsplit languages for packet scanning,” CIAA’08,
pp.141–150, 2008.
[8] “Firekeeper: Detect and block malicious sites,”
http://firekeeper.mozdev.org/
[9] T. Ganegedara, Y.E. Yang, and V.K. Prasanna, “Automation frame
work for largescale regular expression matching on FPGA,”
FPL2010, pp.50–55, 2010.
[10] Z. Kohavi, Switching and Finite Automata Theory, McGrawHill,
1979.
[11] “Application layer packet classifier for linux,”
http://l7filter.sourceforge.net/
[12] C. Lin, C. Huang, C. Jiang, and S. Chang, “Optimization of regular
expression pattern matching circuits on FPGA,” DATE’06, pp.12–
17, 2006.
[13] H. Nakahara, T. Sasao, and M. Matsuura, “A regular expression
matching circuit based on a decomposed automaton,” ARC’11, Lec
ture Notes in Computer Science, no.6578, pp.16–28, March 2011.
[14] H. Nakahara, T. Sasao, and M. Matsuura, “A regular expres
sion matching circuit based on a modular nondeterministic fi
nite automaton with multicharacter transition,”
pp.359–364, Taipei, Oct. 2010.
[15] H. Nakahara, T. Sasao, and M. Matsuura, “A regular expres
sion matching using nondeterministic finite automaton,” MEM
OCODE’10, pp.73–76, Grenoble, France, July 2010.
[16] M. Pellauer, A. Agarwal, A. Khan, M.C. Ng, M. Vijayaraghavan,
F. Brewer, and J. Emer, “Design contest overview: combined ar
chitecture for network stream categorization and intrusion detec
tion (CANSCID),” MEMOCODE’10, pp.69–72, Grenoble, France,
July 2010.
[17] “PCRE:PerlCompatibleRegularExpression,” http://www.pcre.org/.
[18] H.C. Roan, W.J. Hawang, and C.T. Dan Lo., “Shiftor circuit for
efficient network intrusion detection pattern matching,” FPL’06,
pp.785–790, 2006.
[19] R. Sidhu and V.K. Prasanna, “Fast regular expression matching us
ing FPGA,” FCCM’01, pp.227–238, 2001.
[20] “SNORT official web site,” http://www.snort.org.
[21] “SPAMASSASSIN: OpenSource Spam Filter,”
http://spamassassin.apache.org/
[22] “Spartan III data sheet,” http://www.xilinx.com/
[23] L. Tan, and T. Sherwood, “A high throughput string matching archi
tecture for intrusion detection and prevention,” ISCA’05, pp.112–
122, 2005.
[24] “Using Lookup tables as shift registers (SRL16),”
http://www.xilinx.com/support/documentation/application notes/
xapp465.pdf
[25] F. Yu, Z. Chen, Y. Diao, T.V. Lakshman, and R.H. Katz, “Fast and
memoryefficient regular expression matching for deep packet in
spection,” ANCS’06, pp.93–102, 2006.
SASIMI’10,
Hiroki Nakahara
Ph.D. degrees in computer science from Kyushu
Institute of Technology, Fukuoka, Japan, in
2003, 2005, and 2007, respectively.
held a research position at Kyushu Institute of
Technology, Iizuka, Japan. Now, he is an assis
tant professor at Kagoshima University, Japan.
He received the 8th IEEE/ACM MEMOCODE
Design Contest 1st Place Award in 2010, the
SASIMI Outstanding Paper Award in 2010, and
IPSJ Yamashita SIG Research Award in 2011,
received the BE, ME, and
He has
respectively. His research interests include logic synthesis, reconfigurable
architecture, and embedded systems. He is a member of the IEEE.
Page 10
NAKAHARA et al.: A DESIGN METHOD OF A REGULAR EXPRESSION MATCHING CIRCUIT BASED ON DECOMPOSED AUTOMATON
373
Tsutomu Sasao
and Ph.D. degrees in Electronics Engineering
from Osaka University, Osaka Japan, in 1972,
1974, and 1977, respectively.
faculty/research positions at Osaka University,
Japan, IBM T. J. Watson Research Center,
Yorktown Height, NY and the Naval Postgrad
uate School, Monterey, CA. He has served as
the Director of the Center for Microelectronic
Systems at the Kyushu Institute of Technology,
Iizuka, Japan. Now, he is a Professor of De
received the B.E., M.E.,
He has held
partment of Computer Science and Electronics. His research areas include
logic design and switching theory, representations of logic functions, and
multiplevalued logic. He has published more than nine books on logic de
sign including, Logic Synthesis and Optimization, Representation of Dis
crete Functions, Switching Theory for Logic Synthesis, Logic Synthesis
and Verification, and MemoryBased Logic Synthesis, in 1993, 1996, 1999,
2001, and 2011, respectively. He has served Program Chairman for the
IEEE International Symposium on MultipleValued Logic (ISMVL) many
times. Also, he was the Symposium Chairman of the 28th ISMVL held in
Fukuoka, Japan in 1998. He received the NIWA Memorial Award in 1979,
Takeda TechnoEntrepreneurship Award in 2001, and Distinctive Contribu
tionAwardsfromIEEEComputerSociety MVLTCforpaperspresentedat
ISMVLs in 1986, 1996, 2003 and 2004. He has served an associate editor
of the IEEE Transactions on Computers. He is a Fellow of the IEEE.
Munehiro Matsuura
Institute of Technology from 1983 to 1989. He
received the B.E. degree in Natural Sciences
from the University of the Air, in Japan, 2003.
He has been working as a Technical Assistant at
the Kyushu Institute of Technology since 1991.
He has implemented several logic design algo
rithms under the direction of Professor Tsutomu
Sasao. His interests include decision diagrams
and exclusiveOR based circuit design.
studied at the Kyushu