Simplification of CFG and Normal Forms Wen-Guey Tzeng Computer Science Department National Chiao Tung University 1 Normal Forms • We want a cfg with either Chomsky or Greibach normal form – Chomsky normal form • Aa, ABC – Greibach normal form • Aax, xV* 2 • CFG with normal forms are easier for parsing – The membership problem – Given a grammar G and a string w, find the parsing tree for w if a parsing tree exists. w = x+y*z 3 • -free languages – A language that does not contain • We consider CFG G such that L(G) is -free • For any cfg G, there is G’ such that L(G’)=L(G)-{} 4 Transformation to normal forms: steps CFG G=(V, T, P, S) (-free contextfree language) Remove (1) -productions (2) unit-productions (3) useless productions from P to get G’ Convert G’ to normal forms 5 A substitution rule • For AB, A x1Bx2, By1|y2|…|yn is equivalent to Ax1y1x2|x1y2x2|…|x1ynx2, By1|y2|…|yn • Example – Aa|aaA|abBc, BabbA|b is equivalent to Aa|aaA|ababbAc|abbc, BabbA|b 6 Remove -productions • -production: A • Nullable variable A: A* • Steps 1. Find the nullable variable set VN 2. For each Ax1x2…xm, xiVT, • • For each combination xi, xj, …, xk of variables in VN add Ax1 …xi-1 xi+1… xj-1 xj+1 ... xk-1 xk+1…xm Note: don’t add A, if all xi are in VN 7 Example • SABaC, ABC, Bb|, CD|, Dd • Nullable set VN={A, B, C} • Add productions 8 Remove unit-productions • unit-production: AB • Steps – Remove AA immediately – Draw dependency graph for variables A and B with: A*B – For A*B and By1|y2|…|yn • Add Ay1|y2|…|yn – Remove all AB, where A and B are in dependency graph 9 Example • S Aa|B, BA|bb, Aa|bc|B • Draw dependency graph 1. Remove unit productions S Aa, Bbb, Aa|bc 2. Add Sbb|a|bc Abb Ba|bc 3. Finally Sa|bc|bb|Aa Aa|bc|bb Ba|bc|bb 10 Remove useless productions • A variable AV is useful if S can generate some terminal string through it. – That is, S * xAy * w, wT* • Example – SaSb|AB|Ba, AaA, Bb|Bb, CcB|c – S Ba ba. Thus, B is useful. – S is useful. – But, A and C are not useful (useless) 11 • Algorithm (removing useless productions) Input: G=(V, T, P, S) 1. Find the useless variables in Case 1 and remove related useless productions. 2. Find the useless (un-reachable) variables in Case 2 and remove the related useless productions 12 • Two cases for useless variables – Case 1: variables that cannot generate strings in T* • SaSb|AB|Ba, AaA, Bb|Bb, CcB|c • Algorithm (finding variables that generate strings) 1. V1={} 2. For rule Ax, x(TV1)*, add A to V1 3. Repeat 2 until no rules can be added to V1 • V1={S, B, C} • SaSb|Ba, Bb|Bb, CcB|c 13 – Case 2: variables that cannot be reached from S • SaSb|Ba, Bb|Bb, CcB|c • Algorithm: dependency graph S B C • C is un-reachable from S. • SaSb|Ba, Bb|Bb 14 Chomosy normal form • A cfg is in Chomsky normal form (CNF) if all productions are of form ABC, or Aa • Example – SAS|a, ASA|b • Every cfg G, with L(G), has an equivalent CNF grammar. 15 Converting into CNF 1. Apply the rules of removing -, unit-, and useless-productions 2. Convert the productions into the form AC1C2…Cn, or Aa 3. Convert AC1C2…Cn into AC1D1, D1C2D2, …, Dn-2Cn-1Cn 16 Example • SABa, Aaab, BAC • Step 2: • Step 3: 17 Greibach normal form • A cfg is in Greibach normal form (GNF) if all productions are of form AaB1B2…Bn, n0 • Example – SaBC, BaBA, Aa|bBSC • Every cfg G, with L(G), has an equivalent GNF grammar. 18 Example • Example – SAB, AaA|bB|b, Bb – Result • SaAB|bBB|bB, AaA|bB|b, Bb • Example – SabSb|aa – Result • SaBSB|aA, Bb, Aa 19 Parsing (membership) • Question: Given a CFG G and a string w, determine whether wL(G) • Idea: the dynamic programming technique – A large problem is decomposed into smaller problems – Combine solutions to smaller problems into a solution for the large problem 20 • Assume that G is in CNF and w=a1a2…an • Use the dynamic programming technique – Vij={ V : V* aiai+1…aj} • Solve smaller problems Vik, Vk+1,j, for k=i, i+1,…, j-1 • Combine them to compute Vij 21 w = a1 a2 a3 … ai ai+1 … aj-1 aj … an Vij contains the variables that generate aiai+1…aj-1aj ai ai+1 … ak ak+1 ak+2 … aj-1 aj . . . Vk+1 j Vi k Vi k+1 Vk+2 j . . . 22 CYK Algorithm • Input: G=(V, T, S, P) is in CNF and w=a1a2…an – Compute Vij={ AV : A* aiai+1…aj} – Compute • • • • V11, V22, …, Vnn V12, V23, …, Vn-1n … V1n 1. Smallest problem: add A to Vii • if Aai is a production in P 2. Bigger problem: add to A to Vij if • For some k, ikj-1, ABC in P, B in Vik, C in Vk+1 j 3. wL(G) if and only if SV1n 23 Example • SAB, ABB|a, BAB|a • w=aabbb • Steps – V11={A}, V22={A}, V33={B}, V44={B}, V55={B} – V12=, V23={S, B}, V34={A}, V45={A} – V13={S, B}, V24={A}, V35={S, B} – V14={A}, V25={S, B} – V15={S, B} 24 Sum up • Context-free grammars are used in designing programming languages, such as , C, PSACAL, etc. • Membership problem in CFG is equivalent to the parsing problem in programming languages • Normal forms are needed for “automatically” generating a “parser” for the programming language 25
© Copyright 2024