COMP 335: Introduction to Theoretical Computer Science: Fall 2014 Assignment 4

COMP 335: Introduction to Theoretical
Computer Science: Fall 2014
Assignment 4
Due November 11, 2014 at midnight
1. Let G be any context-free grammar without any λ productions or unit productions. Let k be
the maximum number of symbols on the right side of any production in P . Show that there
is an equivalent grammar in Chomsky Normal Form that has no more than (k − 1)|P | + |T |
production rules.
Ans. Suppose G is already in CNF. If k = 1, all productions are of the form S → a for
some a ∈ T , and therefore G has at most |T | ≤ (k − 1)|P | + |T | productions. If instead
k ≥ 2, the number of productions is |P | ≤ (k − 1)|P | + |T |. So we assume G is not in CNF.
We will convert G to CNF and find the number of productions in the resulting grammar.
By assumption, G does not have λ or unit productions; notice that this means k ≥ 2.
Furthermore, since removing useless productions only reduces the number of productions, we
will consider the two remaining procedures to convert to CNF, and show that the grammar
obtained as a result of applying these procedures will have at most the stated number of
productions. In the first procedure, each terminal a in the right hand side of a production
containing both terminals and non-terminals is replaced by a new non-terminal Ta and a new
production of the sort Ta → a. Clearly this only needs |T | new productions. In the second
procedure, every production A → A1 A2 . . . Ai with i > 2 is replaced by i − 1 productions
namely A → A1 B1 , B1 → A2 B2 , . . . , Bi−2 → Ai−1 Ai . Since there are at most |P | productions
that are replaced in this manner, and i ≤ k, we replace the original productions by at most
(k − 1)|P | productions, giving a total of (k − 1)|P | + |T | productions.
2. Let G be a context-free grammar in CNF, and let w ∈ L(G) be the yield of a parse tree for
w according to the grammar G. Prove using induction that if the length of the longest path
in the tree is n, then |w| ≤ 2n−1 .
Ans. We use induction on n.
Basis: If n = 1, then there must be a production of the form S → w. Since G is in CNF, and
w ∈ T ∗ , it follows that |w| = 1 = 20 as needed.
Induction step: Assume that the yield w of a parse tree in which the length of the longest
path is k is at most 2k−1 . Now consider a parse tree in which the length of the longest path
is at most k + 1. The first production must be of the form S → AB. Let the yield of the
tree with A as root be w1 and that with B as root be w2 . Then, since the longest paths in
both subtrees can be of length at most k, we know that |w1 | ≤ 2k−1 and |w2 | ≤ 2k−1 . Since
w = w1 w2 , we conclude that |w| ≤ 2 · 2k−1 = 2k as needed.
3. Convert the following grammars to push-down automata using the standard procedure:
(a) S −→ aABB | aAA
A −→ aBB | a
B −→ bBB | A
First we convert the grammar to Griebach Normal Form.
S −→ aABB | aAA
A −→ aBB | a
B −→ bBB | aBB | a
Next we convert it to a PDA.
a, S → ABB
a, S → AA
a, A → BB
a, A → λ
b, B → BB
a, B → BB
a, B → λ
q0
λ, z → Sz
q1
λ, z → z
qf
(b) S −→ aSb | bSa | ab | ba
First we convert the grammar to Griebach Normal Form.
S −→ aSB | bSA | aB | bA
A→a
B→b
Next we convert it to a PDA.
a, S → SB
a, S → B
b, S → SA
b, S → A
a, A → λ
b, B → λ
q0
λ, z → Sz
q1
λ, z → z
qf
4. Determine whether or not the following languages on Σ = {a, b, c} are context-free. Explain
your answers.
(a) L1 = {an bj ck dl | n ≤ j; k ≤ l}
Ans. L1 is context-free. It is generated by the following CFG.
S → AB
A → aAb | Ab | λ
B → cBd | Bd | λ
(b) L2 = {an bj ck dl | n ≤ k; j ≤ l}
Ans. L2 is not context-free. Suppose it is context-free, and let m be the constant of the
pumping lemma. Choose w = am bm cm dm . Clearly w ∈ L and |w| ≥ m. Let w = uvxyz
with |vxy| ≤ m and |vy| ≥ 1. Then vy cannot contain both a’s and c’s and cannot
contain both b’s and d’s. If vy contains a’s but not c’s, we pump up and if vy contains
c’s but no a’s, we pump down. In both situations, we obtain a string with more a’s than
c0 s. Similarly, if vy contains b’s but no d’s, we pump up and if vy contains d’s but no b’s,
we pump down. In both situations, we obtain a string with more b’s than d’s. In every
case, we obtain a string not in L2 , a contradiction to the pumping lemma. Therefore L2
must be context-free.
(c) L3 = {w1 cw2 | w1 , w2 ∈ (a + b)? , w1 6= w2 }
Ans. L3 is a context-free language. We observe that L3 = LA ∪ LB where
LA = {w1 cw2 | |w1 | =
6 |w2 |} and
LB = {w1 cw2 | the ith symbol of w1 is different from the ith symbol of w2 where i ≤
min(|w1 |, |w2 |)}
It is easy to see that LA is a cfl, for instance the following grammar generates it:
S → XSX | A | B
A → XA | Xc
B → BX | cX
X→a|b
Next we give a grammar for LB . The key idea is to generate strings of length i − 1 before
and after the c before generating the non-matching symbol, as shown in the grammar
below.
S → BaD | AbD
B → XBX | bC
A → XAX | aC
C → XC | c
D → XD | λ
X→a|b
We prove that the grammar above generates LB . Observe that:
i.
ii.
iii.
iv.
?
B ⇒ X i−1 bCX i−1 for all i ≥ 1.
?
X i ⇒ x where x ∈ (a + b)∗ and |x| = i for all i ≥ 1.
?
C ⇒ yc for all y ∈ (a + b)∗
?
D ⇒ z for all z ∈ (a + b)∗
Therefore by starting with the production S → BaD, we obtain
?
?
S ⇒ X i−1 bCX i−1 aD ⇒ x1 bycx2 az, with x1 , x2 , y, z ∈ (a + b)∗ and |x1 | = |x2 | = i − 1.
Similarly, by starting with the production S → AbD, we conclude that
?
S ⇒ x1 aycx2 bz, with x1 , x2 , y, z ∈ (a + b)∗ and |x1 | = |x2 | = i − 1. These are the only
strings that S derives, therefore the above grammar generates LB .
We can also give a PDA for LB as follows. The idea of the PDA is that we ”guess” the
value of i above. We push the stack symbol X on to the stack for every input symbol we
see, then for the ith symbol, we go to different states based on whether we see an a or
a b. At this point there are i − 1 Xs on stack. We keep processing input symbols after
this without altering the stack until we get to a c. Now we pop off the symbol X until
we see the bottom of stack marker z. This means we have seen i − 1 symbols after the
c. If the next symbol varies from the i-th symbol in the string before the c, we go to a
final state.
(d) L4 = {wcw | w ∈ (a + b)? }
Ans. Not context-free. We use the pumping lemma to prove that it is not context-free.
Assume L4 to be context-free and let m be the constant of the pumping lemma, and
choose the string w = am bm cam bm . Clearly w ∈ L4 and |w| ≥ m. Let w = uvxyz with
|vxy| ≤ m and |vy| ≥ 1. We consider the following exhaustive cases:
Case 1: vy contains the symbol c. Then uv 2 xy 2 z has more than 1 c, and therefore
cannot belong to L4 .
Case 2: vy is chosen from the substring before the c. Then vy = ai bj , with i + j ≥ 1.
Therefore uxz = am−i bm−j cam bm ∈
/ L4 since am−i bm−j 6= am bm as either i ≥ 1 or j ≥ 1
(or both).
Case 3: vy is chosen from the substring after the c. Then vy = ai bj , with i + j ≥ 1.
Therefore uxz = am bm cam−i bm−j ∈
/ L4 since am−i bm−j 6= am bm as either i ≥ 1 or j ≥ 1
(or both).
Case 4: vy contains symbols both from the substring of w before the c and after the c
(but does not contain c). Then since |vxy| ≤ m, it must be that v = bi and y = aj with
i, j ≥ 1. Therefore uxz = am bm−i cam−j bm ∈
/ L4 as am bm−i 6= am−j bm .
In all cases, we arrive at a string not in L4 , which contradicts the pumping lemma.
Therefore L4 cannot be context-free.
5. Consider the language L = {ai bj ck | i 6= j, j 6= k, i 6= k}; it is not a context-free language.
Show that it nevertheless satisfies the conditions of the context-free pumping lemma, that is,
show that there exists an m so that for all strings w in L of length at least m, we can write
w = uvxyz with |vxy| ≤ m, |vy| ≥ 1, such that ∀i ≥ 0 : uv i xy i z ∈ L.
Ans. Let m = 3, and consider any string w = ai bj ck ∈ L. Since i 6= j, j 6= k, i 6= k, there is
a strict ordering between the three. We will break up w so that vy consists of only one type
of symbol, specifically the symbol which has the most occurrences in w. We claim that all
pumped strings are still in L.
Consider the case when i > j > k. Then we will split up w = uvxyz with u = v = x = λ
and y = a` where ` ≥ 1 is defined below. Notice that for any ` ≥ 1, pumping up will only
increase further the number of a’s and so will give strings in L. So we only have to consider
the string resulting from pumping down.
Case 1: i = k + 2: Choose ` = 3. Then uxz = ai−3 bj ck = ak−1 bj ck with k − 1 < k < j. Thus
uxz ∈ L.
Case 2: i > k + 2 and i > j + 1: Take ` = 1. Then uxz = ai−1 bj ck with i − 1 > j > k. Thus
uxz ∈ L.
Case 3:i > k + 2 and i = j + 1: Take ` = 2. Then uxz = aj−1 bj ck . Since i > k + 2, we have
i − 2 6= k. Thus uxz ∈ L.
This example shows that the converse of the pumping lemma does not hold. You can have a
language in which all long enough strings can be pumped. Yet, the language is not context-free
(this can shown using a stronger version of the pumping lemma called Ogden’s lemma.)
6. Given a context-free grammar G = (V, T, S, P ), show how to construct a grammar G0 such
that L(G0 ) = L(G)R . Explain your answer.
Ans. Use the same variables, and simply reverse all right hand sides of productions. The
proof that this generates L(G)R is omitted.
7. Give a DPDA for the following languages:
(a) {an b2n+3m cm | n, m ≥ 1}
a, z, AAz
a, A, AAA
q0
b, A → λ
b, A → λ
q1
b, B → BB
b, z → Bz
q2
c, B → λ
q3
λ, B → λ
q4
λ, B → λ
q5
λ, z → z
qf
c, B → λ
(b) {wwr | w ∈ (ab)∗ }
q0
a, z → z
q1
b, z → Az
b, A → AA
a, A → A
q2
b, A → A
q3
a, A → λ
q4
λ, z → z
qf
b, A → A
8. Let L be a DCFL over an alphabet Σ. Let f1 (L) = {w : wa ∈ L for some a ∈ Σ} and let
f2 (L) = {w : aw ∈ L for some a ∈ Σ}. Only one of f1 (L) and f2 (L) is guaranteed to be a
DCFL. Which one? Explain your answer.
Soln. If L is a DCFL, we claim that f1 (L) is a DCFL. Given a DPDA M for L, we can convert
it to a DPDA for f1 (L) by simply converting to a final state any q ∈ Q from which it is possible
to arrive at a state qf ∈ F while consuming one input symbol. Making previously nonfinal states into final states does not make the machine non-deterministic, thus the resulting
machine is a DPDA and accepts f1 (L).
We claim that f2 (L) is not necessarily a DCFL, even if L is a DCFL. Let L = {can bn |
n ≥ 1} ∪ {dan b2n | n ≥ 1}. Then L is clearly a DCFL: a DPDA accepting L can easily be
constructed, by going from the initial state to q1 and q2 without altering the stack, based on
whether the first input symbol is a c or a d (respectively). Now q1 can be the start state for
a DPDA for the language {an bn | n ≥ 0} and q2 the start state for a DPDA for the language
{an b2n | n ≥ 0}. It is easy to see that the constructed machine is a DPDA and that it accepts
L. But f2 (L) = {an bn | n ≥ 1} ∪ {an b2n | n ≥ 1}, which is not a DCFL.