sub-expression

SI-DFA: Sub-expression Integrated Deterministic
Finite Automata for Deep Packet Inspection
Authors: Ayesha Khalid, Rajat Sen†, Anupam
Chattopadhyay
Publisher: Performance Switching and Routing
(HPSR), 2013
Present: Pei-Hua Huang
Date: 2014/05/14
Department of Computer Science and Information Engineering
National Cheng Kung University, Taiwan R.O.C.
INTRODUCTION


There is a space-time trade-off : NFAs are compact
but slow, DFAs are fast but space hungry
An ideal finite automata should thus have the
processing speed of a DFA and space requirements
of an NFA
Computer & Internet Architecture Lab
CSIE, National Cheng Kung University
2
STATE-EXPLOSION

A phenomenon called exponential state
blowup (or state explosion) happens when
the regex corresponding to the NFA has
following constructs
• Counting Constraints
• 1) .{n,m} : wildcard repetition between n~m times
• 2) .{n,} : wildcard repetition at least n times
• 3) .{n} : wildcard repetition exactly n times
• Kleene Star (.*) Conditions
• unbounded wildcard repetitions
Computer & Internet Architecture Lab
CSIE, National Cheng Kung University
3
SUB-EXPRESSION INTEGRATED DFA (SI-DFA)

Break an expression into parts at blowup
conditions and merge them into an integrated
DFA
• break regexes into parts called sub-expressions
•
using kleene star conditions as delimiters
create a merged DFA for all the sub-expressions.
The accepting states of DFA are labeled as Final
Accepting States (FAS) or Sub-expression
Accepting States (SAS)
Computer & Internet Architecture Lab
CSIE, National Cheng Kung University
4
SUB-EXPRESSION INTEGRATED DFA (SI-DFA)


A regex is accepted if its constituent subexpressions are accepted in the right order
A link bit is associated with every subexpression, whose addresses are specified
in an Association Table
Computer & Internet Architecture Lab
CSIE, National Cheng Kung University
5
SUB-EXPRESSION INTEGRATED DFA (SI-DFA)

Ex. ab.*cd and lm
Consider a traffic trace cdablmcd
Computer & Internet Architecture Lab
CSIE, National Cheng Kung University
6
Cases not Conforming with SI-DFA

Pseudo wildcard repetitions
•
•
•
a forbidden character table is constructed with
occurrence of forbidden character x tied to
invalidate the link bit corresponding to subexpression ab
forbidden characters occur in subsequent subexpression cannot be handled by SI-DFA
Ex.
RE = ab[ˆx]*cxd
input = abmcxd
Computer & Internet Architecture Lab
CSIE, National Cheng Kung University
7
Cases not Conforming with SI-DFA

Subsequent sub-expressions overlap
•
•
SI-DFA should start matching a sub-expression only
after a subsequent sub-expression has already
been accepted
Ex.
RE = ab.*bc input = abc
Computer & Internet Architecture Lab
CSIE, National Cheng Kung University
8
Cases not Conforming with SI-DFA

Complete containment in subsequent subexpressions
•
•
SI-DFA will generate erroneous result if a subexpression in a regex is completely contained in its
following sub-expression
Ex.
RE = a.*b.d
input = bad
Computer & Internet Architecture Lab
CSIE, National Cheng Kung University
9
Exact-match removal in .+ Cases


‘dot-plus’ condition, e.g., ab.+cd, will be the one that
matches ab.*cd and not match abcd
first making a Union automata of L1 and L2 and then
converting the accepting state due to L2 as a non
accepting state where L1={ab, cd} and L2={abcd},
L3 = L1−L2
Computer & Internet Architecture Lab
CSIE, National Cheng Kung University
10
PERFORMANCE EVALUATION



developed in C++
Testing platform is an AMD Phenom 1055T
Processor with 8 GB of RAM and Linux
operating system
rule-sets extracted from Bro 2.0 [19], Snort
[20], and linux [21] rules
Computer & Internet Architecture Lab
CSIE, National Cheng Kung University
11
PERFORMANCE EVALUATION
Computer & Internet Architecture Lab
CSIE, National Cheng Kung University
12
PERFORMANCE EVALUATION
Computer & Internet Architecture Lab
CSIE, National Cheng Kung University
13