slides

Simplification of CFG and
Normal Forms
Wen-Guey Tzeng
Computer Science Department
National Chiao Tung University
1
Normal Forms
• We want a cfg with either Chomsky or
Greibach normal form
– Chomsky normal form
• Aa, ABC
– Greibach normal form
• Aax, xV*
2
• CFG with normal forms are easier for parsing
– The membership problem
– Given a grammar G and a string w, find the
parsing tree for w if a parsing tree exists.
w = x+y*z 
3
• -free languages
– A language that does not contain 
• We consider CFG G such that L(G) is -free
• For any cfg G, there is G’ such that L(G’)=L(G)-{}
4
Transformation to normal forms: steps
CFG G=(V, T, P, S)
(-free contextfree language)
Remove
(1) -productions
(2) unit-productions
(3) useless productions
from P to get G’
Convert G’ to
normal forms
5
A substitution rule
• For AB,
A x1Bx2, By1|y2|…|yn
is equivalent to
Ax1y1x2|x1y2x2|…|x1ynx2, By1|y2|…|yn
• Example
– Aa|aaA|abBc, BabbA|b
is equivalent to
Aa|aaA|ababbAc|abbc, BabbA|b
6
Remove -productions
• -production: A
• Nullable variable A: A* 
• Steps
1. Find the nullable variable set VN
2. For each Ax1x2…xm, xiVT,
•
•
For each combination xi, xj, …, xk of variables in VN
add Ax1 …xi-1 xi+1… xj-1 xj+1 ... xk-1 xk+1…xm
Note: don’t add A, if all xi are in VN
7
Example
• SABaC, ABC, Bb|, CD|, Dd
• Nullable set VN={A, B, C}
• Add productions
8
Remove unit-productions
• unit-production: AB
• Steps
– Remove AA immediately
– Draw dependency graph for variables A and B with:
A*B
– For A*B and By1|y2|…|yn
• Add Ay1|y2|…|yn
– Remove all AB, where A and B are in
dependency graph
9
Example
• S Aa|B, BA|bb, Aa|bc|B
• Draw dependency graph
1. Remove unit productions
S Aa, Bbb, Aa|bc
2. Add
Sbb|a|bc
Abb
Ba|bc
3. Finally
Sa|bc|bb|Aa
Aa|bc|bb
Ba|bc|bb
10
Remove useless productions
• A variable AV is useful if S can generate
some terminal string through it.
– That is, S * xAy * w, wT*
• Example
– SaSb|AB|Ba, AaA, Bb|Bb, CcB|c
– S  Ba  ba. Thus, B is useful.
– S is useful.
– But, A and C are not useful (useless)
11
• Algorithm (removing useless productions)
Input: G=(V, T, P, S)
1. Find the useless variables in Case 1 and remove
related useless productions.
2. Find the useless (un-reachable) variables in Case
2 and remove the related useless productions
12
• Two cases for useless variables
– Case 1: variables that cannot generate strings in
T*
• SaSb|AB|Ba, AaA, Bb|Bb, CcB|c
• Algorithm (finding variables that generate strings)
1. V1={}
2. For rule Ax, x(TV1)*, add A to V1
3. Repeat 2 until no rules can be added to V1
• V1={S, B, C}
• SaSb|Ba, Bb|Bb, CcB|c
13
– Case 2: variables that cannot be reached from S
• SaSb|Ba, Bb|Bb, CcB|c
• Algorithm: dependency graph
S
B
C
• C is un-reachable from S.
• SaSb|Ba, Bb|Bb
14
Chomosy normal form
• A cfg is in Chomsky normal form (CNF) if all
productions are of form
ABC, or Aa
• Example
– SAS|a, ASA|b
• Every cfg G, with L(G), has an equivalent
CNF grammar.
15
Converting into CNF
1. Apply the rules of removing -, unit-, and
useless-productions
2. Convert the productions into the form
AC1C2…Cn, or Aa
3. Convert AC1C2…Cn into
AC1D1, D1C2D2, …, Dn-2Cn-1Cn
16
Example
• SABa, Aaab, BAC
• Step 2:
• Step 3:
17
Greibach normal form
• A cfg is in Greibach normal form (GNF) if all
productions are of form
AaB1B2…Bn, n0
• Example
– SaBC, BaBA, Aa|bBSC
• Every cfg G, with L(G), has an equivalent
GNF grammar.
18
Example
• Example
– SAB, AaA|bB|b, Bb
– Result
• SaAB|bBB|bB, AaA|bB|b, Bb
• Example
– SabSb|aa
– Result
• SaBSB|aA, Bb, Aa
19
Parsing (membership)
• Question: Given a CFG G and a string w,
determine whether wL(G)
• Idea: the dynamic programming technique
– A large problem is decomposed into smaller
problems
– Combine solutions to smaller problems into a
solution for the large problem
20
• Assume that G is in CNF and w=a1a2…an
• Use the dynamic programming technique
– Vij={ V : V* aiai+1…aj}
• Solve smaller problems Vik, Vk+1,j, for k=i, i+1,…, j-1
• Combine them to compute Vij
21
w = a1 a2
a3 … ai ai+1 …
aj-1 aj
… an
Vij contains the variables
that generate aiai+1…aj-1aj
ai ai+1 … ak ak+1 ak+2 … aj-1 aj
. . .
Vk+1 j
Vi k
Vi k+1
Vk+2 j
. . .
22
CYK Algorithm
• Input: G=(V, T, S, P) is in CNF and w=a1a2…an
– Compute Vij={ AV : A* aiai+1…aj}
– Compute
•
•
•
•
V11, V22, …, Vnn
V12, V23, …, Vn-1n
…
V1n
1. Smallest problem: add A to Vii
• if Aai is a production in P
2. Bigger problem: add to A to Vij if
• For some k, ikj-1, ABC in P, B in Vik, C in Vk+1 j
3. wL(G) if and only if SV1n
23
Example
• SAB, ABB|a, BAB|a
• w=aabbb
• Steps
– V11={A}, V22={A}, V33={B}, V44={B}, V55={B}
– V12=, V23={S, B}, V34={A}, V45={A}
– V13={S, B}, V24={A}, V35={S, B}
– V14={A}, V25={S, B}
– V15={S, B}
24
Sum up
• Context-free grammars are used in designing
programming languages, such as , C, PSACAL,
etc.
• Membership problem in CFG is equivalent to
the parsing problem in programming
languages
• Normal forms are needed for “automatically”
generating a “parser” for the programming
language
25