2 class Computation Theory

‫‪Save from: www.uotechnology.edu.iq/dep-cs‬‬
‫‪2ndclass‬‬
‫‪Computation Theory‬‬
‫النظرية االحتسابية‬
‫برهجيات‪ -‬نظن هعلوهات‬
‫استاذة الواده‪ :‬م‪.‬م‪ .‬نهى جويل ابراهين‬
_Computation Theory __________________________________________________________MSC Nuha J Ibrahim _
Reference: Introduction to Computer Theory/ Cohen
Algorithms
for Compiler Design / O.G. Kakde
.
Languages
Till half of this century several people define the language as a way
of understanding between the same group of beings, between human
beings, animals, and even the tiny beings, this definition includes all
kinds of understanding, talking, special signals and voices.
This definition works till the mathematician called Chomsky said
that: the language can be defined mathematically as:
1. A set of letters which called Alphabet. This can be seen in any
natural language, for example the alphabet of English can be
defined as:
E = {a, b,…, z}
2. By concatenate letters from alphabet we get words.
3. All words from the alphabet make language.
Language can be classified into two types as follows:
Natural Languages
Languages
Formal Languages
The definition above used with natural language and formal language.
As in the natural language not all concatenations make permissible
words, the same things happen with the formal languages.
Note: formal language deals with form not meaning.
You can easily distinguish from this definition that:
Alphabet: finite
Words: finite
Language: infinite
There is one thing that we must not forget it, is the alphabet could
be a set of an empty set (or null string) which is a string of no letters,
denoted by (Λ).
_Computation Theory __________________________________________________________MSC Nuha J Ibrahim__
Definitions
1.
Function concatenation: we can concatenate the word xxx with the
word xx we can obtain the word xxxxx
Xn concatenated to Xm = Xn+m
Example let a=xx, b=xxx then ab=xxxxx
2. Function length: used to find the length of any word in a language.
Example
lets
a=xxxx, b=543, c= Λ
then
length(a)=4, length(b)=3, length(c)=0
Note: in case where parentheses are letters of the alphabet: S={x ( ) }
Then length(xxxxx)=5
But length((xx)(xxx))=9
3.
Function reverse: if a is a word in some language, then reverse of a is
the same string of letter spelled backward.
lets
a=xxxx, b=543, c= aab
Example
then
reverse(a)=xxxx, reverse(b)=345, reverse(c)=baa.
Here we want to mention that if we apply this function on words
some times the result does not satisfied with the definition of the
language.
Example
Let A an alphabet of the language L1 be {0 1 2 3 4 5 6 7 8 9}
Let L1 = {all words that does not start with zero}
And c=210
Then reverse(c)=012, which is not in L1.
4. Palindrome : is a language that={ Λ, all strings x such that reverse(x)=x}
aba, aabaa, bab, bbb,…
Example
5. Kleene star * : Given an alphabet ∑ we wish to define a language in
which any string of letters from ∑ is a word, even the null
string, this language we shall call the closure of the
alphabet.
if ∑={a,b,c}
Example
Then ∑*={Λ a b c aa ab ac ba bb bc ca cb cc aaa aab aac bba
bbb bbc cca ccb ccc aaaa aaab …}
Lets S= alphabet of language
S*= closure of the alphabet
_Computation Theory __________________________________________________________MSC Nuha J Ibrahim__
Example
S= {x}
S*= { Λ, xn |n>=1}
To prove a certain word in the closure language S* we must show how it
can be written as a concatenation of words from the set S.
Let S={a,ab}
Example
To find if the word abaab is in S* or not, we can factor it as follows:
(ab)(a)(ab)
Every factor in this word is a word in S* so as the whole word abaab.
In the above example, there is no other way to factor this word that we
called unique.
While, some times the word can be factored in different ways.
Example
Lets S={x}
n
S*={ Λ, x | n>=1}
If we factor the word xxxxxxxx, it can be:
(xx) (xx) (xx) (xx), or
(x) (xx) (xxx) (xx), or
(xxx) (xxx) (xx), …
We can obviously say that the method of proving that something
exists by showing how to create it is called proof by constructive
algorithm.
If S=ø then S*= { Λ}
This is not the same
If S= { Λ}
Then S*={ Λ}
If S= {w1 w2 w3}
Then S*= { Λ w1 w2 w3 w1w1w1 w1w1w2 w1w2w1…}
And S+= { w1 w2 w3 w1w1w1 w1w1w2 w1w2w1…}
Which is mean that S+=S* except the Λ
Theorem1
S*= S**
Proof
Every word in S** is made up of factors from S*. Every factor
from S* is made up of factors from S. Therefore, every word in S** is
made up of factors from S. Therefore, every word in S** is also a word in
S*. we can write this as:
S** S*
Or S* S**
This mean S*=S**.
_Computation Theory __________________________________________________________MSC Nuha J Ibrahim__
Regular Expression
The language-defining symbols we are about to create are called
regular expressions. The languages that are associated with these regular
expressions are called regular languages.
Example
consider the language L
where L={Λ x xx xxx …} by using star notation we may write
L=language(x*).
Since x* is any string of x's (including Λ).
Example
if we have the alphabet ∑={a,b}
And L={a ab abb abbb abbbb …}
Then L=language(ab*)
Example
(ab)*= Λ or ab or abab or ababab or abababab or ….
L1=language(xx*)
Example
The language L1 can be defined by any of the expressions:
xx* or x+ or xx*x* or x*xx* or x+x* or x*x+ or x*x*x*xx* …
Remember x* can always be Λ.
Example
language(ab*a)={aa aba abba abbba abbbba …}
Example language(a*b*)={ Λ a b aa ab bb aaa aab abb bbb … }
ba and aba are not in this language so a*b* ≠ (ab)*
the following expressions both define the language
L2={xodd}:
x(xx)* or (xx)*x
But the expression x*xx* does not since it includes the word (xx)x(x).
Example
Example
consider the language T defined over the alphabet ∑={a,b,c}
T={a c ab cb abb cbb abbb cbbb abbbb cbbbb …}
Then T=language((a+c)b*)
T=language(either a or c then some b's)
Example
consider a finite language L that contains all the strings of a's
and b's of length exactly three.
L={aaa aab aba abb baa bab bba bbb}
L=language((a+b)(a+b)(a+b))
L=language((a+b)3)
_Computation Theory __________________________________________________________MSC Nuha J Ibrahim__
Note from the alphabet ∑={a,b} , if we want to refer to the set of all
possible strings of a's and b's of any length (including Λ) we could write
(a+b)*
Example
we can describe all words that begins with a and end with b
with the expression a(a+b)*b which mean a(arbitrary string)b
Example if we have the expression (a+b)*a(a+b)* then the word
abbaab can be considerd to be of this form in three ways: (Λ)a(bbaab) or
(abb)a(ab) or (abba)a(b)
Example
(a+b)*a(a+b)*a(a+b)*
= (some beginning)(the first important a)(some middle)(the
second important a)(some end)
Another expressions that denote all the words with at least two a's are:
b*ab*a(a+b)*, (a+b)*ab*ab*, b*a(a+b)*ab*
Then we could write:
language((a+b)*a(a+b)*a(a+b)*)
=language(b*ab*a(a+b)*)
=language((a+b)*ab*ab*)
=language(b*a(a+b)*ab*)
=all words with at least two a's.
Note: we say that two regular expressions are equivalent if they describe
the same language.
Example
if we want all the words with exactly two a's, we could use
the expression: b*ab*ab* which describe such words as
aab, baba, bbbabbabbbb,…
Example
the language of all words that have at least one a and at least
one b is: (a+b)*a(a+b)*b(a+b)*+(a+b)*b(a+b)*a(a+b)*
Note:
(a+b)*b(a+b)*a(a+b)* ≠ bb*aa* since the left includes the
word aba, which the expression on the right side does not.
Note:
(a+b)* = (a+b)* + (a+b)*
(a+b)* = (a+b)*(a+b)*
(a+b)* = a(a+b)* + b(a+b)* + Λ
(a+b)* = (a+b)*ab(a+b)* + b*a*
Note: usually when we employ the star operation we are defining an
infinite language. We can represent a finite language by using the plus
alone.
_Computation Theory __________________________________________________________MSC Nuha J Ibrahim__
Example
L={abba baaa bbbb}
L=language(abba + baaa + bbbb)
Example
L={ Λ a aa bbb}
L=language(Λ + a + aa + bbb)
L={Λ a b ab bb abb bbb abbb bbbb …}
Example
We can define L by using the expression b* + ab*
Definition
The set of regular expressions is defined by the following rules:
Rule1: every letter of ∑ can be made into a regular expression, Λ is a
regular expression.
Rule2: if r1 and r2 are regular expressions, then so are: (r1) r1r2 r1+r2
r1*.
Rule3: nothing else is a regular expression.
Remember that r1+=r1r1*
Definition
If S and T are sets of strings of letters (whether they are finite or
infinite sets), we define the product set of strings of letters to be:
ST={all combination of a string from S concatenated with a string from
T}
Example if S={a aa aaa} T={bb bbb}
Then ST={abb abbb aabb aabbb aaabb aaabbb}
(a+aa+aaa)(bb+bbb)=abb+abbb+ aabb+aabbb+aaabb+aaabbb)
Example if P={a bb bab} Q={Λ bbbb}
Then PQ={a bb bab abbbb bbbbbb babbbbb}
(a+bb+bab)( Λ+bbbb)=a+bb+bab+ab4+b6+bab5
Example if M={Λ x xx} N={Λ y yy yyy yyyy …}
Then MN={Λ y yy yyy yyyy …
x xy xyy xyyy xyyyy …
xx xxy xxyy xxyyy xxyyyy …}
Using regular expression we could write:
(Λ+x+xx)(y*)=y*+xy*+xxy*
_Computation Theory __________________________________________________________MSC Nuha J Ibrahim__
Definition
The following rules define the language associated with any
regular expression.
Rule1: the language associated with the regular expression that is just a
single letter is that one-letter word alone and the language associated with
Λ is just{Λ}, a one-word language.
Rule2: if r1 is regular expression associated with the language L1 and r2
is regular expression associated with the language L2 then:
i) The regular expression (r1)(r2) is associated with the language L1
times L2.
Language(r1r2)=L1L2
ii) The regular expression r1+r2 is associated with the language
formed by the union of the sets L1 and L2.
Language(r1+r2)=L1+L2
iii) The language associated with the regular expression (r1)* is L1*,
the kleene closure of the set L1 as a set of words.
Language(r1)*=L1*
Example L={baa abba bababa}
The regular expression for this language is: (baa+abba+bababa)
Example L={Λ x xx xxx xxxx xxxxx}
The regular expression for this language is: (Λ+x+xx+xxx+xxxx+xxxxx)
=(Λ+x)5
Example L= language((a+b)*(aa+bb)(a+b)*)
=(arbitrary)(double letter)(arbitrary)
{Λ a b ab ba aba bab abab baba …} these words are not included in L but
they included by the regular expression: (Λ+b)(ab)*(Λ+a)
Example
E=(a+b)*a(a+b)*(a+Λ)(a+b)*a(a+b)*
E=(a+b)*a(a+b)*a(a+b)*a(a+b)*+(a+b)*a(a+b)*Λ(a+b)*a(a+b)*
We have: (a+b)*Λ(a+b)*=(a+b)*
Then: E=(a+b)*a(a+b)*a(a+b)*a(a+b)*+(a+b)*a(a+b)*a(a+b)*
The language associated with E is not different from the language
associated with: (a+b)*a(a+b)*a(a+b)*
Note: (a+b*)*=(a+b)*
(a*)*=a*
(aa+ab*)*≠ (aa+ab)*
(a*b*)*=(a+b)*
Example E=[aa+bb+(ab+ba)(aa+bb)*(ab+ba)]*
Even-even={Λ aa bb aabb abab abba baab baba bbaa aaaabb aaabab…}
_Computation Theory __________________________________________________________MSC Nuha J Ibrahim__
Finite Automata (FA)
A finite automata is a collection of five things:
1. A finite set of states
2. An alphabet ∑ of possible input letters from which are formed
strings that are to be read one letter at a time.
3. A finite set of transitions that tell for each state and for each letter
of the input alphabet which state to go to next.
4. The initial state; and
5. The set of final states.
Therefore formally a finite automata is a five-tuple:
where:
Q is a set of states of the finite automata,
Σ is a set of input symbols, and
δ specifies the transitions in the automata.
If from a state p there exists a transition going to state q on an input symbol a, then we write δ (p, a) = q. Hence, δ is a
function whose domain is a set of ordered pairs, (p, a), where p is a state and a is an input symbol, and the range is a
set of states.
Therefore δ defines a mapping from
q0 is the initial state, and F is a set of final sates of the automata. For example:
where
The transition diagram of this automata is:
Transition Diagram
Transition Table
For example:
where
0
Let x be 010. To find out if x is accepted by the automata or not, we proceed as follows:
δ 1(q0, 0) = δ (q0, 0) = q1
Therefore, δ 1 (q0, 01 ) = δ {δ 1 (q0, 0), 1} = q0
δ 1 (q 0, 010) = δ {δ 1 (q 0, 0 1), 0} = q 1
In the finite automata discussed above, since δ defines mapping from Q × Σ to Q, there exists exactly one transition
from a state on an input symbol; and therefore, this finite automata is considered a deterministic finite automata (DFA).
Therefore, we define the DFA as the finite automata:
where:
M = (Q, Σ, δ , q , F ), such that there exists exactly one transition from a state on a input symbol.
Example if ∑={a,b}, states={x,y,z}
Rules of transition:
1. From state x and input a go to state y.
2. From state x and input b go to state z.
3. From state y and input a go to state x.
4. From state y and input b go to state z.
5. From state z and any input stay at the state z.
Let x be the start state and z be the final state.
a
xy
a
b
b
z+
a,b
Transition Diagram
The FA above will accept all strings that have the letter b in them
and no other strings. The language associated with(or accepted by) this
FA is the one defined by the regular expression: a*b(a+b)*
The set of all strings that do leave us in a final state is called the
language defined by the FA. The word abb is accepted by this FA, but
The word aaa is not.
a
b
xy
Z
y
x
Z
z+
z
Z
Transition Table
_Computation Theory __________________________________________________________MSC Nuha J Ibrahim__
Example The following FA accept all strings from the alphabet {a,b}
except Λ.
a
+
b
a,b
The regular expression is: (a+b)(a+b)*=(a+b)+
Example The following FA accept all words from the alphabet {a,b}.
+a,b
The regular expression is: (a+b)*
Note: every language that can be accepted by an FA can be defined by a
regular expression and every language that can be defined by a regular
expression can be accepted by some FA.
FA that accepts no language will be one of the two types:
1. FA that have no final states. Like the following FA:
a
b
a,b
2. FA in which the final states cannot be reached. Like the following
FA:
-
b
a,b
a
a,b
a
+
b
a,b
Or Like the following FA:
-
a,b
a,b
+
a,b
Example The following FA accept all strings from the alphabet {a,b}
that start with a.
b
The regular expression is: a(a+b)*
a
a,b
+
a,b
_Computation Theory __________________________________________________________MSC Nuha J Ibrahim__
Example The following FA accept all strings from the alphabet {a,b}
with double letter.
a
a
b a
-
+
b
b
a,b
The regular expression is: (a+b)*(aa+bb) (a+b)*
Example the following FA accepts the language defined by the regular
expression: (a+b)(a+b)b(a+b)*
a
a,b
a,b
-
a,b
b
+
a,b
Example the following FA accepts only the word baa.
b
-
a
a
b
b
a
+
a,b
a,b
Example the following FA accepts the words baa and ab.
+
b
a
a,b
a
-
b
a
b
a
b
a,b
a,b
+
_Computation Theory __________________________________________________________MSC Nuha J Ibrahim__
Example the following FA accepts the language defined by the regular
expression: (a+ba*ba*b)+
b
a
a
a
b
b
b
+
a
Example the following FA accepts the language defined by the regular
expression: (a+ba*ba*b)*
a
+-
b
b
b
a
a
Example the following FA accepts only the word Λ.
a,b
+
-
a,b
Example the following FA accepts all words from the alphabet {a,b}
that end with a.
b
a
b
+
a
The regular expression for this language is: (a+b)*a
_Computation Theory __________________________________________________________MSC Nuha J Ibrahim__
Example the following FA accepts all words from the alphabet {a,b}
that do not end in b and accept Λ.
b
+
-
a
a
b
The regular expression for this language is: (a+b)*a + Λ
Example the following FA accepts all words from the alphabet {a,b}
with an odd number of a's.
a
-
+
a
b
b
The regular expression for this language is: b*a(b*ab*ab*)*
Example the following FA accepts all words from the alphabet {a,b}
that have different first and last letters.
b
+
a
a
b
a
b
a
+
b
b
a
The regular expression for this language is: a(a+b)*b + b(a+b)*a
Example the following FA accepts the language defined by the regular
expression (even-even): [aa+bb+(ab+ba)(aa+bb)*(ab+ba)]*
b
b
+a
a
a
b
b
a
_Computation Theory __________________________________________________________MSC Nuha J Ibrahim__
Transition Graph
A Transition Graph (TG) is a collection of three things:
1. A finite set of states, at least one of which is designed as the start
state (-) and some (may be none) of which are designed as final
states (+).
2. An alphabet ∑ of possible input letters from which input string
are formed.
3. A finite set of transitions that show how to go from one state to
another based on reading specified substrings of input letters
(possibly even the null string Λ).
Example the following TG accepts the word baa.
baa
-
+
Example the following TG accepts the words with double letters.
aa,bb
a,b
+
a,b
Example the following TG accepts the word baab in two different ways.
ba
ab
-
+
baa
b
Note: in TG some words have several paths accept them while in FA
there is only one
Note: every FA is also a TG.
Example the following TG accept nothing.
Example the following TG accept Λ.
+-
_Computation Theory __________________________________________________________MSC Nuha J Ibrahim__
Example the following TG accept the words {Λ, baa, abba}.
Λ
abba
+
baa
-
Example the following TG accept all words that end with b.
b
-
+
a,b
Example the following TG accept all words that have different first and
last letters.
b
+
a
a,b
b
a
+
a,b
Example the following TG accept all words in which a's occur in even
clumps only and end in three or more b's.
aa
b
b
b
aa
b
Example the following TG for even-even.
+
aa,bb
ab,ba
ab,ba
aa,bb
+
_Computation Theory __________________________________________________________MSC Nuha J Ibrahim__
Kleen's Theorem
Kleen's Theorem
Any language that can be defined by:
1. Regular expression
or 2. Finite automata
or 3. Transition graph
Can be defined by all three methods.
Proof
The three sections of our proof will be:
Part1: every language that can be defined by a FA can also be defined by
a TG.
Part2: every language that can be defined by a TG can also be defined by
a RE.
Part3: every language that can be defined by a RE can also be defined by
a FA.
The proof of part1
This is the easiest part. Every FA is itself a TG. Therefore, any
language that has been defined by a FA has already been defined by a
TG.
The proof of part2
The proof of this part will be by constructive algorithm. This
means that we present a procedure that starts out with a TG and ends up
with a RE that defines the same language.
™ Let the start states be only one.
b
2
1
b
Λ
2
-1
ab
4
Λ
3
Becomes
ab
4
-3
aa
Λ
aa
5
-5
™ Let the final states be only one.
b
9+
aa
ab
Becomes
12+
b
aa
ab
9
12
Λ
Λ
+
_Computation Theory __________________________________________________________MSC Nuha J Ibrahim__
™ Allow the edges to be labeled with regular expressions (reduce the
number of edges or states in each time).
r3
Becomes
r2
x
x
r1+r2+r3
r1
r1
3
4
r2
2
r1
3
2
r1
3
r2
r3
4
4
Becomes
3
Becomes
r1+r2
4
r1r2
4
2
Becomes
2
r1r2*r3
r2
r1
1
2
4
3
r1r2*r3
3
r3
r4
4
r5
1
Becomes
4
r1r2*r5
5
r2
r1r2*r4
5
r4r2*r3
1
r1
2
r4
Becomes
3
1
2
r1r2*r3
r2
r4
3
r3
r2
™ Repeat the last step again and again until we eliminate all the states
from TG except the unique start state and the unique final state.
-
r1
r2
rn
+
Becomes
-
r1+r2+…+rn
+
_Computation Theory __________________________________________________________MSC Nuha J Ibrahim__
Example
Find the RE that defines the same language accepted by the
following TG using Kleenes theorem.
+
aa
-
aa,bb
bb
+
a,b
aa
-
Λ
aa,bb
+
bb
Λ
a,b
aa
-
Λ
aa+bb
+
bb
Λ
a+b
-
aa+bb
aa
bb
a+b
-
Λ
(aa+bb)(a+b)*aa
(aa+bb)(a+b)*bb
-
+
(aa+bb)(a+b)*aa
(aa+bb)(a+b)*bb
RE=(aa+bb)(a+b)*(aa+bb)
+
Λ
+
_Computation Theory __________________________________________________________MSC Nuha J Ibrahim__
H.W
Find the RE that defines the same language accepted by the
following TG using Kleenes theorem.
ab,ba
+
-
ab,ba
aa,bb
aa,bb
The proof of part3
Rule1: there is an FA that accepts any particular letter of the alphabet.
There is an FA that accepts only the word Λ.
For example, if x is in ∑, then the FA will be:
All ∑
All ∑ axcept x
All ∑
-
x
+
FA accepts only Λ will be:
+-
a,b
a,b
Rule2: if there is an FA called FA1, that accepts the language defined by
the regular expression r1 and there is an FA called FA2, that accepts the
language defined by the regular expression r2, then there is an FA called
FA3 that accepts the language defined by the regular expression (r1+r2).
We can describe the algorithm for forming FA3 as follows:
Starting with two machines FA1, with states x1, x2, x3,…. And FA2 with
states y1,y2,y3,…,build a new machine FA3 with states z1,z2,z3,… where
each z is of the form "xsomething or ysomething". If either the x part or the y
part is a final state, then the corresponding z is a final state. To go from
one z to another by reading a letter from the input string, we see what
happens to the x part and to the y part and go to the new z accordingly.
We could write this as a formula:
znew after letter p=[xnew after letter p]or[ynew after letter p]
_Computation Theory __________________________________________________________MSC Nuha J Ibrahim__
Example
We have FA1 accepts all words with a double a in them, and
FA2 accepts all words ending in b. we need to build FA3 that accepts all
words that have double a or that end in b.
-x1
b
a
b
x2
a
FA1
+x3
-y1
a,b
a
b
a
FA2
a
b
-x1
x2
x1
x2
x3
x1
+x3
x3
x3
The transition table for FA1
a
b
-y1
y1
y2
+y2
y1
y2
The transition table for FA2
z1=x1 or y1
z2= x2 or y1
z3=x1 or y2
z4=x3 or y1
z5=x3 or y2
a
b
-z1
z2
z3
z2
z4
z3
+z3
z2
z3
+z4
z4
z5
+z5
z4
z5
The transition table for FA3
-z1
b
a
b
z2
a
+z3
b
a
+z4
FA3
a
b
a
+z5
b
+y2
b
H.W.
Let FA1 accepts all words ending in a, and let FA2 accepts all
words with an odd number of letters (odd length). Build FA3 that accepts
all words with odd length or end in a, using Kleenes theorem.
-x1
a
b
+x2
b
-y1
a,b
a,b
+y2
a
FA1
FA2
Rule3: if there is an FA1 that accepts the language defined by the regular
expression r1 and an FA2 that accepts the language defined by the
regular expression r2, then there is an FA3 that accepts the language
defined by the concatenation r1r2.
We can describe the algorithm for forming FA3 as follows:
We make a z state for each none final x state in FA1. And for each final
state in FA1 we establish a z state that expresses the option that we are
continuing on FA1 or are beginning on FA2. From there we establish z
states for all situations of the form:
Are in xsomething continuing on FA1
Or
Have just started y1 about to continue on FA2
Or
Are in ysomething continuing on FA2
Example
We have FA1 accepts all words with a double a in them, and
FA2 accepts all words ending in b. we need to build FA3 that accepts all
words that have double a and end in b.
-x1
b
a
b
x2
FA1
a
+x3
-y1
a,b
a
a
b
-x1
x2
x1
x2
x3
x1
+x3
x3
x3
The transition table for FA1
b
a
FA2
+y2
b
-y1
+y2
a
b
y1
y1
y2
y2
The transition table for FA2
z1=x1
z2= x2
z3=x3 or y1
z4=x3 or y2 or y1
a
z2
z3
z3
z3
-z1
z2
z3
+z4
b
z1
z1
z4
z4
The transition table for FA3
-z1
a
z2
b
a
z3
b
a
b
a
+z4
b
FA3
H.W.
Let FA1 accepts all words with a double a in them, and let FA2
accepts all words with an odd number of letters (odd length). Build FA3
that accepts all words with odd length and have double a in them using
Kleen’s theorem.
-x1
b
a
b
x2
FA1
a
+x3
a,b
-y1
a,b
a,b
FA2
+y2
Rule4: if r is a regular expression and FA1 accepts exactly the language
defined by r, then there is an FA2 that will accept exactly the language
defined by r*.
We can describe the algorithm for forming FA2 as follows:
Each z state corresponds to some collection of x states. We must
remember each time we reach a final state it is possible that we have to
start over again at x1.
Remember that the start state must be the final state also.
Example
If we have FA1 that accepts the language defined by the regular
expression:
r=a*+aa*b
We want to build FA2 that accept the language defined by r*.
a
+x1
-
a
b
+x2
a,b
b
+x3
a,b
x4
FA1
a
b
-,+ x1
x2
x4
+x2
x2
x3
+x3
x4
x4
x4
x4
x4
The transition table for FA1
z1=x1
z2=x4
z3=x2 or x1
z4=x3 or x4 or x1
z5=x4 or x2 or x1
a
b
-,+ z1
z3
z2
z2
z2
z2
+z3
z3
z4
+z4
z5
z2
+z5
z5
z4
The transition table for FA2
a
a
+z1
-
a
b
b
+z3
a,b
a
+z4
+z5
b
b
z2
FA2
H.W.
Let FA1 accept the language defined by r1, find FA2 that accept
the language defined by r1* using Kleene's theorem.
r1= aa*bb*
a
-x1
a
b
x2
a,b
x3
FA1
b
b
+x4
a
NONDETERMINISM
A nondeterministic finite automaton (NF) is a collection of three things:
1. A finite set of states with one start state (-) and some final states (+).
2. An alphabet ∑ of possible input letters.
3. A finite set of transitions that describe how to proceed from each
state to other states along edges labeled with letters of the alphabet
(but not the null string Λ), where we allow the possibility of more
than one edge with the same label from any state and some states
for which certain input letters have no edge.
We can convert any NFA into a TG with no repeated labels from any
single state as in the following:
2
a
a
1
3
a
4
Is equivalent to
2
a
1
3
a
Λ
4
a
Λ
Λ
Any FA will satisfy the definition of an NFA. We have:
1. Every FA is an NFA.
2. Every NFA has an equivalent TG.
3. By Kleen's theorem, every TG has an equivalent FA.
Therefore:
Language of FA's
language of NFA's
language of TG's = language
of FA's
Theorem
FA = NFA
By which we mean that any language defined by a nondeterministic finite
automaton is also definable by a deterministic (ordinary) finite automaton
and vice versa.
NON-DETERMINISTIC FINITE AUTOMATA
If the basic finite automata model is modified in such a way that from a state on an input symbol zero, one or more
transitions are permitted, then the corresponding finite automata is called a "non-deterministic finite automata" (NFA).
Therefore, an NFA is a finite automata in which there may exist more than one paths corresponding to x in Σ* (because
zero, one, or more transitions are permitted from a state on an input symbol). Whereas in a DFA, there exists exactly
one path corresponding to x in Σ*. Hence, an NFA is nothing more than a finite automata:
in which δ defines mapping from Q × Σ to 2 (to take care of zero, one, or more transitions). For example, consider the
finite automata shown below:
The transition diagram of this automata is:
Q
Example
Let FA1 be
a
a
a
a
b
-
b
b
+
a,b
b
And let FA2 be
a
b
b
+
a
-
a,b
a
b
Then NFA3= FA1 + FA2 is
a
a
a,b
a
b
b
a
+
b
a
b
a
a
b
a,b
b
+
b
It is sometimes easier to understand what a language is from the picture
of an NFA that accepts it than from the picture of an FA as in the
following example.
Example
The NFA and FA Below accepts the language of all words that
contains either a triple a (the substring aaa) or a triple b (the substring
bbb) or both.
a
a
+
a
a,b
a,b
b
b
b
+
a,b
NFA
a
a
b a
-
a
b
b
+
b
b
a,b
a
FA
COMPARISON TABLE FOR AUTOMATA
Start states
Final states
Edge labels
Number of edges
from each state
Deterministic (every
input string has one
path)
Every path
represents one word
FA
one
Some or none
Letter from ∑
One for each
letter in ∑
TG
One or more
Some or none
words from ∑*
NFA
one
Some or none
Letter from ∑
Arbitrary
Arbitrary
Yes
Not necessarily
Not necessarily
Yes
Yes
Yes
Acceptance of Strings by Non-deterministic Finite Automata
Since an NFA is a finite automata in which there may exist more than one path corresponding to x in Σ*, and if this is,
indeed, the case, then we are required to test the multiple paths corresponding to x in order to decide whether or not x
is accepted by the NFA, because, for the NFA to accept x, at least one path corresponding to x is required in the NFA.
This path should start in the initial state and end in one of the final states. Whereas in a DFA, since there exists exactly
one path corresponding to x in Σ*, it is enough to test whether or not that path starts in the initial state and ends in one
of the final states in order to decide whether x is accepted by the DFA or not.
Therefore, if x is a string made of symbols in Σ of the NFA (i.e., x is in Σ*), then x is accepted by the NFA if at least one
path exists that corresponds to x in the NFA, which starts in an initial state and ends in one of the final states of the
NFA. Since x is a member of Σ* and there may exist zero, one, or more transitions from a state on an input symbol, we
Q
Q
define a new transition function, δ 1, which defines a mapping from 2 × Σ* to 2 ; and if δ 1 ({q0},x) = P, where P is a set
For example, consider the finite automata shown below:
where:
If x = 0111, then to find out whether or not x is accepted by the NFA, we proceed as follows:
THE NFA WITH ∈-MOVES
If a finite automata is modified to permit transitions without input symbols, along with zero, one, or more transitions on
the input symbols, then we get an NFA with ‘∈ -moves,’ because the transitions made without symbols are called
"∈ -transitions."
Consider the NFA shown in.
Finite automata with ∈ -moves.
This is an NFA with ∈ -moves because it is possible to transition from state q0 to q1 without consuming any of the
input symbols. Similarly, we can also transition from state q1 to q2 without consuming any input symbols. Since it is a
finite automata, an NFA with ∈ -moves will also be denoted as a five-tuple:
where Q, Σ, q0, and F have the usual meanings, and δ defines a mapping from
(to take care of the ∈ -transitions as well as the non ∈ -transitions).
Acceptance of a String by the NFA with ∈-Moves
A string x in Σ* will ∈ -moves will be accepted by the NFA, if at least one path exists that corresponds to x starts in an
initial state and ends in one of the final states. But since this path may be formed by ∈ -transitions as well as
non-∈ -transitions, to find out whether x is accepted or not by the NFA with ∈ -moves, we must define a function,
∈ -closure(q), where q is a state of the automata.
The function ∈ -closure(q) is defined follows:
∈ -closure(q)= set of all those states of the automata that can be reached from q on a path labeled by
∈.
For example, in the NFA with ∈ -moves given above:
∈ -closure(q0) = { q0, q1, q2}
∈ -closure(q1) = { q1, q2}
∈ -closure(q2) = { q2}
The function
∈ -closure (q) will never be an empty set, because q is always reachable from itself, without dependence on any input
symbol; that is, on a path labeled by ∈ , q will always exist in ∈ -closure(q) on that labeled path.
If P is a set of states, then the ∈ -closure function can be extended to find ∈ -closure(P ), as follows:
Since x is a member of Σ*, and there may exist zero, one, or more transitions from a state on an input symbol, we
Q
Q
define a new transition function, δ , which defines a mapping from 2 × Σ* to 2 . If x is written as wa, where a is the
last symbol of x and w is a string made of remaining symbols of x then:
Q
Q
since δ 1 defines a mapping from 2 × Σ* to 2 .
such that P contains at least one member of F and:
For example, in the NFA with ∈ -moves, given above, if x = 01, then to find out whether x is accepted by the automata
or not, we proceed as follows:
Therefore:
∈ -closure( δ 1 (∈ -closure (q0), 01) = ∈ -closure({q1}) = {q1, q 2} Since q2 is a final state, x = 01 is accepted by the
automata.
Equivalence of NFA with ∈-Moves to NFA Without ∈-Moves
For every NFA with ∈ -moves, there exists an equivalent NFA without ∈ -moves that accepts the same language. To
obtain an equivalent NFA without
1
∈ -moves, given an NFA with ∈ -moves, what is required is an elimination of
∈ -transitions from a given automata. But simply eliminating the ∈ -transitions from a given NFA with ∈ -moves will
change the language accepted by the automata. Hence, for every ∈ -transition to be eliminated, we have to add some
non-∈ -transitions as substitutes in order to maintain the language's acceptance by the automata. Therefore,
transforming an NFA with ∈ -moves to and NFA without ∈ -moves involves finding the non-∈ -transitions that must be
added to the automata for every ∈ -transition to be eliminated.
Consider the NFA with ∈ -moves shown in
Therefore, by adding these non-∈ -transitions, and by making the initial state one of the final states, we get the automata shown in.
Therefore, when transforming an NFA with ∈ -moves into an NFA without ∈ -moves, only the transitions are required
to be changed; the states are not required to be changed. But if a given NFA with q0 and ∈ -moves accepts ∈ (i.e., if
the ∈ -closure (q0) contains a member of F), then q0 is also required to be marked as one of the final states if it is not
already a member of F. Hence:
If M = (Q, Σ, δ , q0, F) is an NFA with ∈ -moves, then its equivalent NFA without ∈ -moves will be M1 = (Q, Σ, δ 1, q0, F1)
where δ 1 (q, a) = ∈ -closure( δ ( ∈ -closure(q), a))
and
F1 = F ∪ (q0) if ∈ -closure (q0) contains a member of F
F1 = F otherwise
For example, consider the following NFA with ∈ -moves:
M=({q0,q1,q2},{0,1}, δ, q0, {q2})
Where
δ
q0
q1
q2
0
{q0}
Ф
Ф
1
Ф
{q1}
{q2}
λ
{q1}
{q2}
Ф
Its equivalent NFA without λ- moves will be:
M=({q0,q1,q2},{0,1}, δ, q0, {q0,q2})
δ1
q0
q1
q2
0
{q0,q1,q2}
Ф
Ф
1
{q1,q2}
{q1,q2}
{q2}
FINITE AUTOMATA WITH OUTPUT
We shall investigate two different models for FA's with output
capabilities; these are Moore machine and Mealy machine.
A Moore machine is a collection of five things:
1. A finite set of states q0,q1,q2,… where q0 is designed as the start
state.
2. An alphabet of letters for forming the input string ∑= { a, b, c, …}.
3. An alphabet of possible output characters Г = { x, y, z, …}.
4. A transition table that shows for each state and each input letter
what state is reached next.
5. An output table that shows what character from Г is printed by
each state that is entered.
A Moore machine does not define a language of accepted words, since
every input string creates an output string and there is no such thing as a
final state. The processing is terminated when the last input letter is read
and the last output character is printed.
Example
Input alphabet: ∑ = {a, b}
Output alphabet: Г = {0, 1}
Names of states: q0, q1, q2, q3. (q0 = start state)
Old state
-q0
q1
q2
q3
Transition table
New state
After input a
after input b
q1
q3
q3
q1
q0
q3
q3
q2
The Moore machine is:
b
q0/1
a
q2/0
a
q1/0
b
b
b
a
q3/1
a
Output table
(the character printed
in the old state)
1
0
0
1
Note: the two symbols inside the circle are separated by a slash "/", on the
left side is the name of the state and on the right is the output from that
state.
If the input string is abab to the Moore machine then the output
will be 10010.
Example
The following Moore machine will "count" how many times the
substring aab occurs in a long input string.
b
a
a
q0/0
a
q1/0
q2/0
b
b
q3/1
a
b
The number of substrings aab in the input string will be exactly the
number of 1's in the output string.
Input string
State
Output
q0
0
a
q1
0
a
q2
0
a
q2
0
b
q3
1
a
q1
0
b
q0
0
b
q0
0
a
q1
0
a
q2
0
b
q3
1
A Mealy machine is a collection of four things:
1. A finite set of states q0,q1,q2,… where q0 is designed as the start
state.
2. An alphabet of letters for forming the input string ∑= { a, b, c, …}.
3. An alphabet of possible output characters Г = { x, y, z, …}.
4. A pictorial representation with states represented by small circles
and directed edges indicating transitions between states. Each edge
is labeled with a compound symbol of the form i/o where i is an
input letter and o is an output character. Every state must have
exactly one outgoing edge for each possible input letter. The edge
we travel is determined by the input letter i; while traveling on the
edge we must print the output character o.
b
q0
0
Example
The following Mealy machine prints out the 1's complement of an
input bit string.
1/0,0/1
q0
If the input is 001010 the output is 110101. This is a case where the input
alphabet and output alphabet are both {0,1}.
Example
The following Mealy machine called the increment machine.
0/0,1/1
0/1
no
carry
0/1
start
1/0
carry
1/0
If the input is 1011 the output is 1100.
Definition
Given the Mealy machine Me and the Moore machine Mo, which
prints the automatic start-state character x, we will say that these two
machines are equivalent if for every input string the output string from
Mo is exactly x concatenated with the output from Me.
Note: we prove that for every Moore machine there is an equivalent
Mealy machine and for every Mealy machine there is an equivalent
Moore machine. We can then say that the two types of machine are
completely equivalent.
Theorem
If Mo is a Moore machine, then there is a Mealy machine Me that is
equivalent to it.
Proof
The proof will be by constructive algorithm.
a
a/t
b
Becomes
q4/t
b/t
b
q4
b/t
Example
Below, a Moore machine is converted into a Mealy machine:
q1/1
a
q0/0
a
b
b
q2/0
a,b
q3/1
q1
a/1
q0
becomes
b
a/1
b/0
b/0
a
a/1,b/1
q3
b/1
q2
a/0
Theorem
For every Mealy machine Me there is a Moore machine Mo that is
equivalent to it.
Proof
The proof will be by constructive algorithm.
a
a/1
b/1
q4
becomes
b
q4/1
b
b/1
If there is more than one possibility for printing as we enter the state, then
we need a copy of the state for each character we might have to print. (we
may need as many copies as there are character in Г).
a/0
b/1
b/0
a
b/1
q4
becomes
a/1
b
b/1
b/1
b
1
q4 /0
a/1
q42/1
a/1
a
a/0
q41/0
q6
a/0
Becomes
q4
a/0
b
q6
q42/1
b/1
a/0
b
Example
Convert the following Mealy machine to Moore machine:
b/1
q1
a/1
a/0
q0
a/1
b/0
b/0
q2
a/0
b/1
q3
Mealy machine
b
q1/1
a
a
q01/0
q02/1
a
b
b
b
q21/0
a
a
b
a
b
q22/1
q3/0
Moore machine
Example
Draw the Mealy machine for the following sequential circuit:
input
NAND
A
OR
DELAY
B
OR
output
First we identify four states:
q0 is A= 0 B= 0
q1 is A= 0 B= 1
q2 is A= 1 B= 0
q3 is A= 1 B= 1
the operation of this circuit is such that after an input of 0 or 1 the state
changes according to the following rules:
new B= old A
new A = (input) NAND (old A OR old B)
output = (input) OR (old B)
Suppose we are in q0 and we receive the input 0.
new B = old A = 0
new A = 0 NAND (0 OR 0)
= 0 NAND 0
=1
output = 0 OR 0 = 0
the new state is q2 (since new A=1, new B=0)
if we are in state q0 and we receive the input 1:
new B= old A = 0
new A = 1 NAND (0 OR 0) =1
output = 1 OR 0 =1
the new state is q2.
We repeat this process for every state and for each input to produce the
following table:
Old state
After input 0
After input 1
New state
Output
New state
Output
q0
q2
0
q2
1
q1
q2
1
q0
1
q2
q3
0
q1
1
q3
q3
1
q1
1
q0
1/1
0/1
q1
1/1
1/1
q3
0/0,1/1
q2
0/0
0/1
Mealy machine
Comparison table for automata
FA
Start states
one
Final states
Some or none
Edge labels
Letters from
∑
Number of edges
from each state
deterministic
output
One for each
letter in ∑
yes
no
NFA
NFA- Λ
Moore
Mealy
one
one
one
one
Some or
none
Some or none
none
none
words
from ∑*
Letters
from ∑
Letters from
∑ or Λ
Letter from ∑
arbitrary
arbitrary
arbitrary
no
no
no
no
no
no
TG
One or
more
Some or
none
One for each
letter in ∑
yes
yes
i/o
i from ∑
o from Г
One for each
letter in ∑
yes
yes
A phrase-Structure Grammar
A phrase-Structure Grammar, called PSG, is a collection of three things:
1. An alphabet ∑ of letters called terminals.
2. A set of symbols called nonterminals that includes the start symbol S.
3. A finite set of productions of the form:
String 1 → String 2
Where string 1 can be any string of terminals and nonterminals that contains
at least one nonterminals and where string 2 is any string of terminals and
nonterminals whatsoever.
Definition:
The language generated by the PSG is the set of all strings of terminals that
can be derived starting at S.
Example: the following is a phrase-structure grammar over Σ={a,b} with
nonterminals X and S:
S
XS
X
aX | a
aaaX
ba
In this language we can have the following derivation:
S
XS
XXS
XXXS
XXX
aXXX
aaXXX
aaaXXX
baXX
baaXX
baaaX
bba
Example:
S
aSBA
S
abA
AB
BA
bB
bb
bA
ba
aA
aa
A context-sensitive grammar (CSG)
A context-sensitive grammar (CSG) is a formal grammar in which the
left-hand sides and right-hand sides of any production rules may be
surrounded by a context of terminal and nonterminal symbols.
Definition
A formal grammar G = (N, Σ, P, S)
Where N the Non - Terminal
Σ the terminal
P is context-sensitive if all rules in P are of the form αAβ → αγβ
where A Є N (i.e., A is a single nonterminal),
α,β Є (N U Σ)* ( α and β are strings of nonterminals and terminals)
and γ Є (N U Σ)+ ( γ is a nonempty string of nonterminals and terminals).
A rule of the form S → λ provided S does not appear on the right side of any
rule where λ represents the empty string is permitted.
The addition of the empty string allows the statement that the context
sensitive languages are a proper superset of the context free languages,
rather than having to make the weaker statement that all context free
grammars with no →λ productions are also context sensitive grammars.
Example: This grammar generates the context sensitive language:
1.
2.
3.
4.
5.
6.
7.
8.
9.
CONTEXT FREE GRAMMAR
A context free grammar, called CFG, is a collection of three things:
1. An alphabet ∑ of letters called terminals from which we are going
to make strings that will be the words of a language.
2. A set of symbols called nonterminals, one of which is the symbol
S, standing for "start here".
3. A finite set of production of the form:
One nonterminal → finite string of terminals and/ or nonterminals
Where the strings of terminals and nonterminals can consist of only
terminals or of any nonterminals, or any mixture of terminals and
nonterminals or even the empty string. We require that at least one
production has the nonterminal S as its left side.
Definition
The language generated by the CFG is the set of all strings of terminals
that can be produced from the start symbol S using the production as
substitutions. A language generated by the CFG is called a context free
language (CFL).
Example
Let the only terminal be a.
Let the only nonterminal be S.
Let the production be:
S → aS
S→Λ
The language generated by this CFG is exactly a*.
In this language we can have the following derivation:
S → aS → aaS → aaaS → aaaaS → aaaaaS → aaaaaΛ = aaaaa
Example
Let the only terminal be a.
Let the only nonterminal be S.
Let the production be:
S → SS
S→a
S→Λ
The language generated by this CFG is also just the language a*.
In this language we can have the following derivation:
S → SS → SSS → SaS → SaSS → ΛaSS → ΛaaS → ΛaaΛ = aa
Example
Let the terminals be a, b. And the only nonterminal be S.
Let the production be:
S → aS
S → bS
S→a
S→b
The language generated by this CFG is (a+b)+.
In this language we can have the following derivation:
S → bS → baS → baaS → baab
Example
Let the terminals be a, b. And the only nonterminal be S.
Let the production be:
S → aS
S → bS
S→Λ
The language generated by this CFG is (a+b)*.
In this language we can have the following derivation:
S → bS → baS → baaS → baaΛ=baa
Example
Let the terminals be a, b,Λ. And the nonterminals be S,X,Y.
Let the production be:
S→X
S→Y
X→Λ
Y → aY
Y → bY
Y→a
Y→b
The language generated by this CFG is (a+b)*.
Example
Let the terminals be a, b,Λ. And the nonterminals be S,X,Y.
Let the production be:
S → XY
X→Λ
Y → aY
Y → bY
Y→a
Y→b
The language generated by this CFG is (a+b)+.
Example
Let the terminals be a, b.
Let the nonterminals be S,X.
Let the production be:
S → XaaX
X → aX
X → bX
X→Λ
The language generated by this CFG is (a+b)* aa(a+b)*.
To generate baabaab we can proceed as follows:
S→XaaX→bXaaX→baXaaX→baaXaaX→baabXaaX→baabΛaaX=baabaaX
→baabaabX→baabaabΛ=baabaab
Example
Let the terminals be a, b.
Let the nonterminals be S,X,Y.
Let the production be:
S → XY
X → aX
X → bX
X→a
Y → Ya
Y → Yb
Y→a
The language generated by this CFG is (a+b)* aa(a+b)*.
To drive babaabb:
S→XY→bXY→baXY→babXY→babaY→babaYb→babaYbb→babaabb
Example
Let the terminals be a, b.
Let the nonterminals be S,BALANCED,UNBALANCED.
Let the production be:
S → SS
S → BALANCED S
S → S BALANCED
S→Λ
S → UNBALANCED S UNBALANCED
BALANCED→ aa
BALANCED→ bb
UNBALANCED → ab
UNBALANCED → ba
The language generated by this CFG is even-even.
Example
Let the terminals be a, b.
Let the nonterminals be S,A,B.
Let the production be:
S → aB
S → bA
A→a
A → aS
A → bAA
B→b
B → bS
B → aBB
The language generated by this CFG is the language EQUAL of all
strings that have an equal number of a's and b's.
H.W
Find the RE for the following CFG:
S → XS| Λ
X → ZY
Z → abZ| baZ| ab| ba
Y → aa| bb
The empty string in Context Free Grammar’s
Let G = (N,T,P,S) be any CFG with Λ- productions.
Then G’ =(N,T,P’,S), where P’ is constructed from P as follows:
1. Put all the Λ-free productions of P into P’.
Λ.
2. Find all the nonterminals AЄN such that A
Theorem: For any CFL,L, there exists an Λ-free CFG,G, such that
L(G) = L\{Λ}
Example:
S
[E] | E
E
T| E+T | E-T
T
F| T*F | T/F
F
a| b| c| Λ
The Λ-free grammar constructed from G has productions
S
[E] | E |[ ]
E
T| E+T | E-T | E+ | E- | +T | -T | + | T
F| T*F | T/F | T* | T/ | *T | /T | * | /
F
a| b| c
Trees
Example
S → AA
A → AAA | bA | Ab | a
If we want to produce the word bbaaaab, the tree will be:
S
A
A
A A A
A
b
a a
A
b
A
b
a
a
This diagram is called syntax tree or parse tree or generation tree or
production tree or derivation tree.
Example
S → (S) | S&S | ~S | p| q
The derivation tree for the word (~ ~ p & ( p & ~ ~ q )) will be:
S
(
S
)
S
&
S
~
S
(
S
)
~
S
S
&
S
p
p
~
S
~
q
Example
S → S + S | S* S | number
Does the expression 3+4*5 mean (3+4)*5 which is 35 or does it mean
3+(4*5) which is 23?
We can distinguish between these two possible meaning for the
expression 3+4*5 by looking at the two possible derivation trees that
might have produced it.
S
S
S
+
S
3
S
*
or
S
S
5
3
4
S
S
S
+
S
3
S
*
S
S
5
4
*
S
+
S
5
4
S
3 +
S
S
3
20
+
23
S
*
5
4
Or
S
S
3
S
S
*
S
+
S
5
3
S
S
*
S
+
4
5
7
*
4
Example
S → AB
A→a
B→b
S → AB → aB → ab or S → AB → Ab → ab
S
S
A
B
A
B
a
b
a
b
There is no ambiguity of interpretation.
5
35
Definition
A CFG is called ambiguous if for at least one word in the language that it
generates there are two possible derivations of the word that correspond
to different syntax trees.
Example
The CFG for palindrome
S → aSa | bSb |a |b | Λ
S → aSa → aaSaa → aabaa
S
a
S
a
a
S
a
b
The CFG is unambiguous.
Example
S → aS |Sa | a
In this case the word a3 can be generated by four different trees:
S
S
S
S
a
S
a
S
S
a
a
a
S
a
a
a
S
a
S
S
a
a
a
S
The CFG is therefore ambiguous.
The same language can be defined by the CFG:
S → aS | a
For which the word a3 has only one production tree:
S
This CFG is not ambiguous.
a
S
a
S
a
Definition
For a given CFG we define a tree with the start symbol S as its root and
whose nodes are working strings of terminals and nonterminals. The
descendants of each node are all possible results of applying every
production to the working string, one at a time. A string of all terminals is
a terminal node in the tree. The resultant tree is called the total language
tree of the CFG.
Example
For the CFG
S → aa | bX | aXX
X → ab | b
The total language tree is:
S
bX aXX
aa
bab bb
aabX abX
aabab aabb
aXab
aXb
aabab
abab aabb abb
abb
abab
The total language has only seven different words. Four of it's words
(abb, aabb, abab, aabab) have two different possible derivation because
they appear as terminal nodes in this tree in two different places.
However the words are not generated by two different derivation trees
and the grammar is unambiguous. For example:
S
Example
a
X X
a
b
b
Consider the CFG:
S → aSb | bS | a
The total tree of this language begins:
S
aSb bS a
aaSbb abSb
aab baSb
bbS
aaaSbbb aabSbb aaabb
The trees may get arbitrary wide as well as infinitely long.
ba
linear grammar
A grammar is linear if it is context-free and all of its productions' right
hand sides have at most one nonterminal.
A linear language is a language generated by some linear grammar.
Example
A simple linear grammar is G with N = {S}, Σ = {a, b}, P with start symbol
S and rules
S → aSb
S→Λ
It generates the language
Relationship with regular grammars:
Two special types of linear grammars are the following:
1. the left-linear or left regular grammars, in which all nonterminals
in right hand sides are at the left ends;
2. the right-linear or right regular grammars, in which all
nonterminals in right hand sides are at the right ends.
These two special types of linear grammars are known as the regular
grammars; both can describe exactly the regular languages.
Another special type of linear grammar is the linear grammars in which all
nonterminals in right hand sides are at the left or right ends, but not
necessarily all at the same end.
By inserting new nonterminals, every linear grammar can be brought into
this form without affecting the language generated. For instance, the rules of
G above can be replaced with
S → aA
A → Sb
S→Λ
REGULAR GRAMMARS
Note : all regular languages can be generated by CFG's, and so can some
non-regular languages but not all possible languages.
Example
Consider the FA below, which accepts the language of all words
with a double a:
a
-S
b
M
a
+F
b
a,b
All the necessary to convert it to CFG is that:
1. every edge between states be a production:
X
c
Y
becomes
X→ cY
and
2. every production correspond to an edge between states:
X→ cY
comes from
X
c
Y
or to the possible termination at a final state:
X→ Λ
only when X is a final state.
So the production rules of our example will be:
S → aM | bS
M → aF | bS
F →aF | bF | Λ
Definition
For a given CFG a semiword is a string of terminals (maybe none)
concatenated with exactly one nonterminal (on the right), for example:
(terminal) (terminal) . . . (terminal) (Nonterminal)
Example
Consider the following FA with two final states:
a
S±
M+
b
a
F
b
a,b
So the production rules of our example will be:
S → aM | bS | Λ
M → aF | bS | Λ
F →aF | bF
Theorem
All regular languages can be generated by CFG's. This can be stated as:
All regular languages are CFL's.
Example
The language of all words with an even number of a's (with at least some
a's) is regular since it can be accepted by this FA:
Sb
a
a
M
b
a
F+
b
We have the following set of productions:
S → aM | bS
M → aF | bM
F →aM | bF | Λ
Theorem
If all the productions in a given CFG fit one of the two forms:
Nonterminal→ semiword
Or
Nonterminal → word
(where the word may be Λ) then the language generated by this CFG is
regular.
Proof
We shall prove that the language generated by such a CFG is regular by
showing that there is a TG that accepts the same language. We shall build
this TG by constructive algorithm.
Let us consider a general CFG in the form:
N1 → w1N2
N40 → w10
N41 → w23
N1 → w2N3
N2 → w3N4
...
...
Where N's are the nonterminals and w's are strings of terminals, and the
parts wyNz are the semiwords used in productions. One of these N's must
be S. Let N1 = S.
Draw a small circle for each N and one extra circle labeled +. The circle
for S we label - .
S-
N2
N3 . . . Nx
...
+
Draw a directed edge from state Nx to Nz and label it with the word wy.
wy
Nx
Nz
If the two nonterminals above are the same the path is a loop.
For every production rule of the form:
Np → wq
Draw a directed edge from Np to + and label it with the word wq.
Np
wq
+
We have now constructed a transition graph.
Definition
A CFG is called a regular grammar if each of its productions is of one
of the two forms:
Nonterminal → semiword
Nonterminal → word
Example
Consider the CFG:
S → aaS | bbS | Λ
It is a regular grammar and the whole TG is:
-
Λ
+
aa,bb
It is corresponds to the regular expression:
(aa+bb)*
Example
Consider the CFG:
S → aaS | bbS | abX | baX |Λ
X → aaX | bbX | abS | baS
The TG for even-even is:
ab,ba
±
ab,ba
aa,bb
X
aa,bb
Example
Consider the CFG:
S → aA | bB
A → aS | a
B → bS | b
The corresponding TG is:
a
-
A
a
a
+
b
b
B
b
This language can be defined by the regular expression: (aa+bb)+
H.W
Find CFG that generate the regular language over the alphabet ∑={a,b} of all strings
without the substring aaa.
CHOMSKY NORMAL FORM (CNF)
Theorem
If L is a context-free language generated by CFG that includes Λproductions, then there is a different CFG that has no Λ-production that
generates either the whole language L(if L does not include the word Λ)
or else generates the language of all the words in L that are not Λ.
Definition
In a given CFG, we call a nonterminal N nullable if:
• There is a production: N→ Λ
Or
• There is a derivation that start at N and leads to Λ: N→…→ Λ
Example
Consider the CFG:
S → a| Xb| aYa
X→Y|Λ
Y→b|X
X and Y are nullable.
The new CFG is:
S → a| Xb| aYa| b| aa
X→Y
Y→b|X
Example
Consider the CFG:
S → Xa
X → aX| bX | Λ
X is the only nullable nonterminal.
The new CFG is:
S → Xa| a
X → aX| bX |a| b
Example
Consider the CFG:
S → XY
X → Zb
Y→ bW
Z → AB
W→ Z
A → aA| bA| Λ
B → Ba| Bb| Λ
A, B, W and Z are nullable.
The new CFG is:
S → XY
X → Zb| b
Y→ bW| b
Z → AB| A| B
W→ Z
A → aA| bA| a| b
B → Ba| Bb| a| b
Definition
A production of the form: One Nonterminal → One Nonterminal
is called a unit production.
Theorem
If there is a CFG for the language L that has no Λ-production, then there
is also a CFG for L with no Λ-production and no unit production.
Example
Consider the CFG:
S → A| bb
A → B| b
B → S| a
S→A
gives S → b
S→A→B
gives S → a
A→B
gives A → a
A→B→S
gives A → bb
B→S
gives B → bb
B→S→A
gives B → b
The new CFG for this language is:
S → bb| b| a
A → b| a| bb
B → a| bb| b
Theorem
If L is a language generated by some CFG then there is another CFG that
generates all the non- Λ words of L, all of these productions are of one of
two basic forms:
Nonterminal → string of only Nonterminals
Or
Nonterminal → One Terminal
Example
Consider the CFG:
S → X1| X2aX2| aSb| b
X1 → X2X2| b
X2 → aX2| aaX1
Becomes:
S →X1
S →X2AX2
S →ASB
S →B
X1 → X2X2
X1 → B
X2 → AX2
X2 → AAX1
A→ a
B→b
Example
Consider the CFG:
S → Na
N → a| b
Becomes:
S → NA
N → a| b
A→a
Theorem
For any CFL the non- Λ words of L can be generated by a grammar in
which all productions are of one of two forms:
Nonterminal → string of exactly two Nonterminals
Or
Nonterminal → One Terminal
Definition
If a CFG has only productions of the form:
Nonterminal → string of two Nonterminals
Or of the form:
Nonterminal → One Terminal
It is said to be in Chomsky Normal Form (CNF).
Example
Convert the following CFG into CNF:
S→ASA
S→ BSB
S→AA
S→BB
S→a
S→b
A→a
B→b
The CNF:
S→AR1
R1→SA
S→ BR2
R2→ SB
S→AA
S→BB
S→a
S→b
A→a
B→b
S→ aSa| bSb| a| b| aa| bb
Example
Convert the following CFG into CNF:
S→bA| aB
A→ bAA| aS| a
B→aBB| bS| b
The CNF:
S→YA| XB
A→ YR1| XS| a
B→XR2| YS| b
X→ a
Y→ b
R1→ AA
R2→ BB
Example
Convert the following CFG into CNF:
S→ AAAAS
S→ AAAA
A→ a
The CNF:
S→ AR1
R1→ AR2
R2 → AR3
R3→ AS
S → AR4
R4→ AR5
R5→ AA
A→ a
Greibach normal form
a context-free grammar is in Greibach normal form (GNF) if the
right-hand sides of all productions start with a terminal symbol. A
context-free grammar is in Greibach normal form, if all production rules
are of the form:
where A is a nonterminal symbol,
a is a terminal symbol,
X is a (possibly empty) sequence of nonterminal symbols not
including the start symbol.
Example1:convert the following CFG to GNF:
S
A
B
AB
BS| b
SA| a
1. Convert the CFG to CNF
2. Rename nonterminals (A,S,B)
S becomes A1
A becomes A2
B becomes A3
The grammar becomes:
A1
A2A3
A2
A3A1| b
A3
A1A2| a
3. Compare the value of i,j as Ai
A1
A2A3
2>1
A2
A3A1| b
3>2
A3
A1A2| a
1<3
Aj
j>i
A3
A3
B3
A2A3A2 | a
A3A1A3A2 | bA3A2 | a
A1A3A2 | A1A3A2B3
2<3
3=3
A3
bA3A2 | bA3A2B3 | a | aB3
A2
bA3A2A1| aA1 | bA3A2B3A1 | aB3A1| b
A1
bA3A2A1A3| aA1A3 | bA3A2B3A1A3 | aB3A1A3| bA3
B3
bA3A2A1A3A3A2| aA1A3 A3A2 | bA3A2B3A1A3 A3A2 |
aB3A1A3 A3A2| bA3 A3A2 | bA3A2A1A3A3A2B3| aA1A3 A3A2B3 |
bA3A2B3A1A3 A3A2B3 | aB3A1A3 A3A2B3| bA3 A3A2B3
Example2:
S
ASB | AB
A
a
B
b
Convert to CNF
S
AR1 | AB
R1
SB
A
a
B
b
S becomes A1
R1 becomes A2
A becomes A3
B becomes A4
The grammar becomes:
A1
A3A2 | A3A4
A2
A1A4
A3
a
A4
b
3>1
1< 2
A2
A3A2A4 | A3A4A4
The GNF:
A2
aA2A4 | aA4A4
A1
aA2 | aA4
A3
a
A4
b
Chomsky hierarchy
The Chomsky hierarchy consists of the following levels:
Type-0: grammars (unrestricted grammars) include all formal grammars.
Type-1: grammars (context-sensitive grammars) generate the contextsensitive languages.
Type-2: grammars (context-free grammars) generate the context-free
languages.
Type-3 grammars (regular grammars) generate the regular languages.
Every regular language is context-free, every context-free language, not
containing the empty string, is context-sensitive and every context-sensitive
language is recursive and every recursive language is recursively
enumerable.
The following table summarizes each of Chomsky's four types of grammars,
the class of language it generates, the type of automaton that recognizes it,
and the form its rules must have.
Grammar Languages
Type-0
Recursively
enumerable
Type-1
Context-sensitive
Type-2
Context-free
Type-3
Regular
Automaton
Turing machine
Production rules
(constraints)
α
βγ (no restrictions)
Linear-bounded non-deterministic αAβ
Turing machine
Non-deterministic pushdown
A
automaton
A
Finite state automaton
and
A
αγβ
γ
a
aB
PUSHDOWN AUTOMATA (PDA)
Definition
A PDA is a collection of eight things:
1. An alphabet ∑ of input letters.
2. An input TAPE (infinite in one direction). Initially the string of
input letters is placed on the TAPE starting in cell i. The rest of the
TAPE is blanks.
3. An alphabet Гof STACK characters.
4. A pushdown STACK (infinite in one direction). Initially the
STACK is empty (contains all blanks)
5. One START state that has only out_adges, no in-edges.
START
6. HALT states of two kinds: some ACCEPT and some REJECT they
have in-edges and no out-edges.
REJECT
ACCEPT
7. Finitely many non branching PUSH states that introduce characters
onto the top of the STACK. they are of the form:
PUSH X
Where X is any letter in Γ.
8. Finitely many branching states of two kinds:
i. States that read the next unused letter from the TAPE.
READ
Which may have out-edges labeled with letters from ∑ and the
blank character ∆, with no restrictions on duplication of labels
and no insistence that there be a label for each letter of ∑, or ∆.
ii. States that read the top character of STACK.
POP
Which may have out-edges labeled with letters from Гand the
blank character ∆, again with no restrictions.
Note: we require that the states be connected so as to become a connected
directed graph.
Theorem
For every regular language L there is some PDA that accepts it.
Poof
Since L is regular, so it is accepted by some FA, then we can convert FA
to PDA (as in the following example).
Example
a
-
+
b
b
Becomes:
a
START
b
b
READ
a
READ
∆
ACCEPT
a
∆
REJECT
Example
-
a
a
+
b
a,b
b
Becomes:
START
a,b
b
b
READ
∆
REJECT
a
READ
a
READ
∆
ACCEPT
∆
REJECT
Note: we can find PDA accepts some non regular languages(as in the
following example).
Example
The language accepted by this PDA is exactly: {anbn,n=0,1,2,…}
START
PUSH a
a
∆
READ
POP
b
REJECT
b,∆
b
a
POP
a,b
∆
∆
ACCEPT
READ
a
REJECT
Or
START
PUSH X
a
∆
READ
POP
b
b
X
POP
READ
a
∆
REJECT
Language accepted by
nondeterministic PDA
Language accepted by
deterministic PDA
Language
accepted by FA
or NFA or TG
∆
∆
ACCEPT
X
REJECT
Example
Consider the palindrome X, language of all words of the form:
sXreverese(s), where s is any string in (a+b)*, such as {X aXa bXb aaXaa
abXba aabXbaa …}
START
a
PUSH a
a
X
READ
a
READ
b
∆
b
PUSH b
∆
ACCEPT
POP
b
POP
POP
Example
odd palindrome ={a b aaa aba bab bbb …}
START
a
PUSH a
a
READ
a,b
READ
∆
b
PUSH b
ACCEPT
∆
a
POP
b
b
POP
POP
Nondeterministic PDA
Example
Consider the language generated by CFG:
4
S→S+S|S*S|4
+
READ1
*
READ2
READ3
START
S
+
*
∆
∆
POP
PUSH1 S
PUSH2 S
S
ACCEPT
S
READ4
PUSH5 S
PUSH3 +
PUSH6 *
PUSH4 S
PUSH7 S
Now we trace the acceptance of the string: 4+4*4
State
start
push1 S
pop
push2 S
push3 +
push4 S
pop
read1
pop
read2
pop
push5 S
push6 *
push7 S
pop
read1
pop
read3
pop
read1
pop
read4
accept
Stack
∆
S
∆
S
+S
S+ S
+S
+S
S
S
∆
S
*S
S*S
*S
*S
S
S
∆
∆
∆
∆
∆
Tape
4+4*4
4+4*4
4+4*4
4+4*4
4+4*4
4+4*4
4+4*4
+4*4
+4*4
4*4
4*4
4*4
4*4
4*4
4*4
*4
*4
4
4
∆
∆
∆
∆
H.W
Find a PDA that accepts the language: {ambnan, m=1,2,3,…,n=1,2,3,…}
TURING MACHINE
Definition
A Turing machine (TM) is a collection of six things:
1. An alphabet ∑ of input letters.
2. A TAPE divided into a sequence of numbered cells each
containing one character or a blank.
3. A TAPE HEAD that can in one step read the contains of a cell on
the TAPE, replace it with some other character, and reposition
itself to the next cell to the right or to the left of the one it has just
read.
4. An alphabet Гof character that can be printed on the TAPE by the
TAPE HEAD.
5. A finite set of states including exactly one START state from
which we begin execution, and some (may be none) HALT states
that cause execution to terminate when we enter them. The other
states have no functions, only names: q1, q2, … or 1, 2, 3, …
6. A program, which is a set of rules that tell us on the basis of the
letter the TAPE HEAD has just read, how to change states, what to
print and where to move the TAPE HEAD. We depict the program
as a collection of directed edge connecting the states. Each edge is
labeled with a triplet of information: (letter, letter, direction). The
first letter (either ∆ or from ∑ or Γ) is the character that the TAPE
HEAD reads from the cell to which it is pointing, the second letter
(also ∆ or from Γ) is what the TAPE HEAD prints in the cell
before it leaves, the third component, the direction, tells the TAPE
HEAD whether to move one cell to the right(R) or to the left (L).
Note: TM is deterministic. This means that there is no state q that has two
or more edges leaving it labeled with the same first letter. For example,
the following TM is not allowed:
q2
(a,a,R)
q1
(a,b,L)
q3
Example
Find TM that can accepts the language defined by the regular expression:
(a+b)b(a+b)*
(a,a,R)
(b,b,R)
START 1
(∆,∆,R)
2
3
HALT 4
(b,b,R)
(a,a,R),
(b,b,R)
Now we trace the acceptance of the string: aba
1→2→3 → 3 → 4
aba aba aba∆ aba∆ aba∆∆
Example
Find TM that can accepts the language {anbn}
(a,a,R), (B,B,R)
START 1
(a,A,R)
2
(B,B,L)
(b,B,L)
3
(B,B,R)
(A,A,R)
5
(∆,∆,R)
(A,A,R)
4
(a,a,L)
(a,a,L)
HALT
Now we trace the acceptance of the string: aaabbb
aaabbb Aaabbb Aaabbb Aaabbb AaaBbb AaaBbb AaaBbb AaaBbb
AaaBbb AaaBbb AaaBbb AAaBBb AAaBBb AAaBBb AAaBBb
AAABBb AAABBb AAABBb AAABBB AAABBB AAABBB
AAABBB AAABBB AAABBB AAABBB∆ HALT
Example
Find TM that can accepts the language palindrome.
(a,a,R), (b,b,R)
(a,∆,R)
2
(b,b,L), (a,a,L)
(∆,∆,L)
(a,∆,L)
3
(∆,∆,R)
4
(∆,∆,R)
(∆,∆,R)
START 1
HALT 8
(∆,∆,R)
(a,a,R), (b,b,R)
(b,∆,R)
5
(∆,∆,L)
6
(b,b,L), (a,a,L)
(b,∆,L)
(∆,∆,R)
7
Let us trace the running of this TM on the input string: ababa
1
→
→
4
→
∆bab∆
∆bab∆
5
→
∆baba
ababa
4
2
→ 6
∆∆ab∆
→
∆∆ab∆
2
→
∆baba
4
→
Find TM for even-even.
4
→
∆bab∆
→
∆∆a∆∆
H.W
→
∆baba
∆bab∆
7
2
7
→
∆∆a∆∆
2
→
∆baba
1
→
∆baba∆
→
∆bab∆
1
2
5
→
∆∆ab∆
→
∆∆a∆∆
2
3
→
∆baba
5
→
∆∆ab∆
→
∆∆∆∆∆
3 →
∆∆∆∆∆
8
HALT