UNIT – I Introduction to Data Structures:- Ex. :- 1942 1942 A year It may be Number, Vehicle Number, Book Number. Information +context= knowledge context1:1942 “A freedom Struggle movement ” Context2:1942 “A Love Story” Short Definition . :- Data structure means organization of data. Ex. :- Arrange Books in Book shelf Objective :- To access book in efficient way. Definition 1 :- Data structure is a way of organizing data that consider not only items stored but also the relationship between them. Objective:- Data should be retrieved in efficient & convenient way . Definition 2 :- Data is organized in many different ways. The logical or mathematical model of a particulars organization of data is called as data structure. Magic square problem:Arrange 1 to 9 numbers in 3 x 3 matrix, such that sum of all rows ,columns & diagonal elements are same. 6 15 1 7 5 2 9 15 15 8 15 3 15 4 15 15 15 Logic :- Top – left . Types of Data Structures :- Data structures operations :1) Traversing :- Accessing & processing each record exactly once. 2) Inserting :- Adding a new record to the data structure . 3) Deleting:- Removing a particular record from the data structure. 4) Sorting:- Arranging the data in some given order. 5) Searching:-Finding the location. 6) Merging :- Combining the record in two different sorted files into a single file. Basic Terminology :- Elementary data organization Data item :- Refers to a single unit of values. Ex.:- Ex.:- Age Name Fname Mname Lname Collection of data are frequently organized into a hierarchy of fields ,records & files. Additional Terminology :1)Entity :- An entity is something that has certain attributes or properties. 2)Entity set :- Entities with similar attributes form an entity set . Ex. :- All employees from an organization. 3)Field:- Field is a single elementary unit of information representing an attribute of an entity. 4)Record :- It is the collection of fields of a given entity. 5)File:- It is the collection of records of the entities in a given entity set. 6)Primary key :- Field k, which uniquely identify the record from a file is called as primary key & the values K1,K2……… in such field are called as keys or key values. The above organization of data into fields ,records & files may not be complex enough to maintain & efficiently process certain collections of data. Study of data structures include following three steps. 1) Logical or mathematical description of the structure. 2) Implementation of the structure on a computer. 3) Quantitative analysis of the structure ,which includes determining the amount of memory needed to store the structure & the time required to process the structure. Data Structures :- Data may be organized in many different ways ,the logical or mathematical model of a particular organization of data is called as data structures. The choice of a particular data model depends on two considerations 1) First, it must be rich enough in structure to mirror the actual relationship of the data in the real world. 2) The structure should be simple enough that one can effectively process the data when necessary. Different Data Structures :Array :- It is linear data structures. Array is collection of data elements & some data type [This structure uses contiguous memory locations.]. Let A is an array of N elements . then by using bracket Notation, A[1],A[2],……….A[N] The number K in A[K] is called the subscript & A[K] is called a subscripted valiable. [ Elements of array are stored in successive consecutive memory locators.] Ex:- STUDENT MARKS Rollno. Sub 1 Sub 2 Sub 3 Sub 4 1 50 60 70 55 2 40 20 90 75 Sub Ram Amit Rohan Rita Sunil 50 Advantage :- Structure is simple . Arrays are easy to traverse ,search & sort. Disadvantages:- Insertion & deletion is difficult .It involves data movement. 1) Linked list:- Amit Info Chetan Link Gita Nitin 1) Stack:- A stack is also called as LIFO i. e. Last–in–First out structure .It is a linear list in which whatever item inserted last is deleted first. Ex.:- Stack of Books. 2) Queue :- It is also called as FIFO i.e. First- in- First out structure .It is a linear list in which whatever item inserted first is deleted first. Ex.:- A queue of people waiting for a bus. 3) Trees:- Data frequently contain a hierarchical relationship between various elements The data structure which reflects this relationship is called as tree. Ex:- Soc sec no. Lname Employee Name Fname Address . Mname Age Street Salary. Dependent s Area endents City 01 Employee 02 Soc Sec No. 02 Name 03 Lname 03 Fname 03 Mname 02 Address 03 Street 03 Area 04 City 04 State 04 Zip 02 Age 02 Salary 02 Dependents State Zip Ex.:- Consider Algebric expression. (2x+y) (a-7b)3 * Exponential + * 2 y x a 3 * 7 b 6) Graph :- Data sometimes contain a relationship between pairs of elements which is not necessarily hierarchical in nature. For Ex. Suppose an airline files only between the cities connected by lines in following fig. .The data structure which reflects this type of relationship is called a graph. Shrinagar Delhi Mumbai Cheenai Nagpur Data structure operations:The following four operations play a major role . 1) Inserting :- Adding a new record to the structure. 2) Deleting :- Removing a record from the structure. 3) Traversing :-Accessing each record exactly once so that certain items in the record may be processed. 4) Searching:- Finding the location of the record which satisfy the one or more conditions. The following two operations , which are called in special situation. 1) Sorting:- Arranging the records in some order. 2) Merging :- Combining the records in two different sorted files into a single sorted file. The major objective is to develop Efficient Algorithm for processing data . Two major measures of the efficiency of an algorithm. Time Space The complexity of an algorithm is the function which gives the running time and/or space in terms of the input size. Complexity :- Space –time tradeoffs. Complexity:- The complexity of an algorithm is a function f(n) which measures the time and /or Space used by an algorithm in terms of the input size n. The space- time tradeoff refers to a choice between algorithmic solutions of a data processing problem that allows one to decrease the running time of an algorithmic solution by increasing the space to store the data & vice versa. Algorithm Notation:Algorithm :- A step . by step procedure to solve a particular problem Format:Algorithm 1: (find sum) ----------------------------------------- Paragraph Which tells the purpose of algorithm. ----------------------------Step 1 Step 2. List of steps that is to be executed. Step n. Exit Steps ,Control, Exit: The steps of the algorithm are executed one after the other beginning with step 1 .(Unless indicated otherwise) Control may be transferred to step n of the algorithm by the statement. “Go to step n.” The algorithm is completed with the statement Exit. Comments :- Each step may contain a comment in brackets which indicates the main purpose of the step. The comment will usually appear at the beginning or at the end of step. Variable names :- Variable names will use capital letters. For Ex.:- DATA Assignment statement :- Assignment statement will use the dots equal notation “:=”. For Ex.:- Max:=DATA [1] The above statement assigns the value in DATA[1] to Max. Input :- Data may be input & assigned to variables by means of a read statement with the Following form. Read: Variable names Output :- Message placed in quotation marks ,& data in variables may be output by means of a write statement with the following form. Write : Messages and/or variable names. Procedure:- The term ‘procedures’ will be used for an independent algorithmic module which solves a particular problem. Use the world procedure instead of algorithm. Control Structures There are three types of logic or flow control. 1) Sequence logic or Sequential flow 2) Selection logic or Conditional flow 3) Iteration logic or Repetitive flow or loops 1) Sequential flow :- Algorithm Flow Chart Module A Module A Module B Module B Module C Module C 2 Conditional flow :The conditional structures fall into three types. 2.1> Single Alternative :Syntax :If Condition, then : [Module A] [End of If Structure] Flowchart:- Condn ? Yes Module A No 2.2> Double Alternative :Syntax :If Condition, then : [Module A] Else: [Module B] [End of If Structure] Flowchart:- Condn No Yes Module A 2.3> Multiple Alternative:Syntax :If Condition (1), then : [Module A1] Else if condition (2), then: [Module A2] Else if condition (M), then: [Module AM] Else: [Module B] [End of If Structure] Module B 3. Repeat – Flow [ Loops]:3.1> Repeat –For-Loop :Syntax:- Repeat for K=R to S by T [Module] [End of loop]. Where , R is initial value S is end value or test value T is increment. Flowchart :K←R Yes Is K> S ? ?? No Module [Body of Loop] No K ← K+T 3.2 > Repeat –While Loop :Syntax:Repeat while condition : [Module] [End of loop] Flowchart :n Cond .? Yes Module Yes [Body of Loop] Complexity of Algorithm Suppose M is an algorithm ,& suppose n is the size of the input data.The time & space used by the algorithm M are two main measures for calculating efficiency of M. The complexity of an algorithm M is the function f(n) which gives the running time &/or storage space requirement of the algorithm in terms of the size n of the input data. There are three cases. 1) Worst case :- The maximum value of f(n) for any possible input. 2) Average case :- The excepted value of f(n) .Average case uses the probability theory. E= n1p1+n2p2+………+nkpk suppose the numbers n1,n2,….nk occurs with respective probabilities p1 p2…….pk 3) Best case:- The minimum possible value of f(n). Subalgorithms:A subalgorithms is a complete & independently defined algorithmic module which is used (or called) by some other subalgorithm. A subalgorithm receives values called arguments, from an originating (calling) algorithm, performs computations;& then send back the result to the calling algorithm. The subalgorithm is defined independently so that it may be called by many different algorithm or called at different times in the same algorithm. [The relationship between an algorithm & a subalgorithm is similar to the relationship between a main program & subprogram in a programming language.] Write a function subalgorithm MEAN to find the average of 3 numbers A,B & C. Function 2.5 : MEAN (A,B,C) 1. Set AVE :=(A+B+C)/3. 2. Return (AVE). Write a procedure SWTICH to interchange values of AAA & BBB . Function 2.6 : SWITCH (AAA,BBB) 1. Set TEMP :=AAA, AAA:=BBB And BBB:= Temp. 2. Return . 3. Write an algorithm to find out the roots of quadratic equation ax2+bx+c=0 Algorithm 2.2 : (Quadratic Equation) This algorithm inputs the coefficients A,B,C of a quadratic equation & outputs the real solution ,if any. Step1 . Read : A,B,C. Step 2 . Set D:= B2- 4Ac. Step 3 . If D>0 then : (a) Set x1:=(-B+ D ) / 2A and Set x2:=(-B - D ) / 2A. (b) Write :X1, X2. Else if D=0 ,then: (a) Set x :=-B/2A. (b) Write : ’Unique solution’, X. Else : Write :’No Real Solutions’ [End of If Structure.] Step 4 .Exit. Write an algorithm to find the largest element in an array. Algorithm 2.3 : (largest element in array) Given a non empty array DATA with N numeric values ,this algorithm finds the location LOC & the value Max of the Largest element of DATA. 1 .[Initialize] Set K:=1,LOC:= 1 and Max:= DATA[1]. 2 . Repeat step 3 & 4 while K ≤ N : 3 . If Max < DATA [K] ,then : Set LOC := K and Max := DATA[K]. [End of If Structure.] 4 . Set K := K+1 . [End of step 2 loop.] 5. Write :LOC , Max. 6. Exit. Linear search :Algorithm 2.4 : (Linear search) A linear array DATA with N elements & a specific ITEM of information are given .This algorithm finds the location LOC of ITEM in the array DATA or Sets LOC=0. 1. [ Initialize] set K:=1 and LOC:= 0. 2. Repeat steps 3 & 4 while LOC =0 and K≤ N . 3. If ITEM=DATA [K] , Then: Set LOC:= K. 4. [Increment Counter] Set K:=K+1 . [End of step 2 loop.] 5. [ Successful ?] If LOC=0,then : Write : ITEM is not in the array DATA. Else : Write :LOC is the location of ITEM. [End of If structure] 6. Exit. Complexity of Linear search :The complexity of the search algorithm is given by the number C of comparisons between ITEM & DATA [k]. * Worst case :- The worst case occurs when ITEM is the last element in the array DATA or is not there at all. C(n)=n Accordingly C(n)=n is the worst case complexity of linear search algorithm. * Average case :- Here we assume that ITEM does appear in DATA ,and it is equally likely to occur at any position in the array .Accordingly the number of comparisons can 1 be any of the numbers 1,2,3 ,…. N and each number occurs with probability p= then n C(n) =1. 1 1 1 +2 . +……+n. n n n = (1+2 + ……….+n). = 1 n n( n 1) 1 . 2 n = n 1 2 n ( Average no. of comparisons are approximately equal to half the number 2 of elements in the data list.) DATA Types:1) character 2) Real (pr floating point) 3) Integer (or fixed point) 4) Logical * Variables : Global Variables :- Variables that can be accessed by all program modules. Local variables:- Each program module contains its own list of variables called Local variables. Sieve Method :- To find all prime numbers less than m. Prime no. :- An integer n>1 is called prime number if its only positive divisions are 1 & n. If n> 1 is not prime (Composite no.) .then n must have a divisor k≠ 1, such that k ≤ n Or in other words k2 ≤ n Problem:- Fins all prime numbers less than 30. Step 1 :- List the 30 Numbers 1 2 17 3 18 4 5 19 20 6 7 21 8 9 22 23 10 24 11 12 13 14 15 16 25 26 27 28 29 30 11 12 13 14 15 16 25 26 27 28 29 30 Step 2 :-Cross out 1 & multiples of 2 1 2 17 3 18 4 5 19 20 6 7 21 8 9 22 23 10 24 Step 3 :- Since 3 is the first number following 2 that has not been eliminated cross out multiples of 3 from the list. 1 2 17 3 18 4 5 19 20 6 7 21 8 9 22 23 10 24 11 12 13 14 15 16 25 26 27 28 29 30 Step 4 :- Since 5 is the first number following 3 that has not been eliminated ,cross out multiples of 5 from the list. 1 2 3 4 5 17 18 19 20 6 7 21 8 22 9 23 10 24 11 12 13 14 15 16 25 26 27 28 29 30 Step 5 :- Now 7 is the first number following 5 that has not been eliminated but 72 > 30.This means the algorithm is finished & the numbers left in the list are the prime numbers less than 30. 2 3 5 7 11 13 17 19 23 29 String processing:Each programming language contains a character set that is used to communicate with the computers. The set usually includes the following. Alphabet:- A,B…………….X,Y,Z Digit -: 0,1,2,………………..8,9 Special characters:- +, -, /, *, () , $, =, ’, , String :- A finite sequence s of zero or more characters is called string. Length:- The number of characters in a string is called its length. The string With zero characters is called empty string or the null string. Ex.:String Length ‘The End’ ‘To be or note to be’ ‘‘ ‘ ‘ 7 18 0 2 Concatenation of String :Let S1 & S2 be strings .The string consisting of the characters of S1 followed by the characters of S2 is called the concatenation of S1 & S2 .It will be denoted by S1// S2. For Ex.:- ‘The ‘// ‘END’=’TheEnd’ ‘The’ // ‘’ // ‘End ‘ =’The End “ And length S1 //S2 is equal to sum of lengths of the strings S1 & S2. Substring :A string Y is called a substring of a string S if there exists X & Z such that S= X//Y//Z If X is an empty string then Y is called initial substring of S,& if Z is an empty string then Y is called a terminal of S. For Ex.:- ‘THE’ is an initial substring of ‘THE END’ . ‘BE OR NOT ‘ is a substring of ‘To BE OR NOT TO BE’ Storing String :Strings are stored in three types of structures. Storing Strings Fixed Length Structures 1) Variable Length Structures Linked Structures Record –Oriented ,Fixed length storage :In this type of storage method records with fixed length are used to store strings. a) Records stored sequentially in the memory. Suppose we assume that records has fixed length 10, & we used it to stored name of students. A M I T F R A K I Y A 200 A H A N 210 P R 220 Advantages :1) The ease of accessing data from any given record. 2) The ease of updating data in any given record (As long as the length of the new data does not exceed the record length.) Disadvantages :1)Time is wasted reading an entire record if most of the storage consists of inessential blank spaces. 2) Certain records may require more space than available. 3) When the correction consists of more or fewer characters then the original text, changing a misspelled word requires the entire record to be changed. b) Records stored using pointer:In above method (1 a ) suppose we wanted to insert a new record. This would require that all succeeding records to be moved to new memory locations. However this disadvantages can be easily removed by following method where we can use an array POINT which gives the address of each successive record .So that records need not be stored in consecutive location in memory . Accordingly ,inserting a new record will require only an updating of array POINT. POINT 1 2 3 2) A M I T F A R A K P R I Y A H A N Variable –Length storage with Fixed Maximum:Although strings may be stored in fixed –length memory locations, there are advantages in knowing the actual length of each string .For Ex. One then does not have to read the entire record when the string occupies only the beginning part of the memory location. The storage of variable length strings in memory cells within fixed lengths can be done in two general ways. A) One can use a marker ,such as two dollar sign ($$),to signal the end of the string. B) One can list the length of the string – as an additional item in the pointer array. Records with sentinels A M I T $ $ F A R A K H A N $$ P R I Y A$$ Records whose length are listed 1 4 2 7 3 5 AMIT FARAKHAN PRIYA Remarks :- Records stored sequentially one might be tempted to store strings one after another by using some separation marker such as $$ or by using a pointer array giving the location of strings. AMIT $$ FARAKHAN $$ P R I Y A$$ ……. Fig :- a END AMIT FARAKHAN PRIYA …… 1 2 3 Fig :- b Advantages :- This method of storing strings will save space. Disadvantages :- This method is usually inefficient when the strings & their lengths are frequently being changed. 3 ) Linked storage :In this method strings are stored in linked list .A linked list is a linearly ordered sequence of memory cells ,called nodes. Where each node contains an item like which points to the next node in the list . schematic diagram of linked list. XXX XXX XXX Each memory cell is assigned one character or a fixed number of characters, & like contains the address of the cell / node containing next character or group of characters in the string. Ex:- string S =’TO BE OR NOT TO BE’ One character per node (a) T O B E Four characters per node (b) T O B E O R N O T Character data type :Constants :- Many programming languages denote string constants by placing the string in either single or double quotation marks. For Ex.:- ‘The END ‘ & ‘TO BE OR NOT TO BE’ are string constants of length 7 & 18 characters respectively . Variables :Character Variable Static Static Semistatic Dynamic :- A static character variable means a variable whose length is defined before the program is executed & can not change throughout the program. Semistatic :- A semistatic character variable means a variable whose length may vary during the execution of the program as long as the length does not exceed a maximum value determined by the program before the program is executed. Dynamic :- Dynamic character variable means a variable whose length can change the execution of the program. String operations : A ) Substring :- Group of consecutive elements in a string is called as substring , to access substring from a given string following operation is used. SUBSTRING(string, initial ,length) Where, initial – is the position of first character of the substring in the Given string. Ex.:- SUBSTRING( ‘TO BE OR NOTE TO BE ‘,4,7) ‘BE OR N’ SUBSTRING( ‘THE END ‘,4,4) ‘ END’ B)Indexing :- It is also called as pattern matching ,refers to finding the position where a string pattern P first appears in a given string text T. INDEX (Text, Pattern) If the pattern P does not appear in the text T, then INDEX is assigned the value 0. Ex.:-Let T=’His father is the professor ‘ Then , INDEX(T, ‘THE’) =7 INDEX(T, ‘THEN’) =0 INDEX(T, ‘ THE ’) =14 C)Concatenation :- Let S1 & S2 be the strings then concatenation of S1 & S2 is denoted by S1 // S2 is the string consisting of the characters S1 followed by the characters of S2. Ex.:- Let S1 =’MARK’ & S2 =’TWAIN’ Then, S1 // S2 =’MARKTWAIN’ S1 //” “ // S2 = ‘ TWAIN’ d)Length:- The number of characters in a string is called its length . LENGTH(string) Ex:- LENGTH(‘Computer’)=8 LENGTH(‘0’)=0 LENGTH(‘ ’)=0 LENGTH(‘’)=1 Word Processing :Following are the world processing operations. A)Insertion :- Inserting a string in the middle of the text. Suppose in a given text T we want to insert a string S so that S begins in position K. INSERT (Text, Position, String) Ex.:- INSERT (‘ABCDEFG’,3,’XYZ’)= ‘ABXYZCDEFG’ INSERT (‘ABCDEFG’,6,’XYZ’)= ‘ABCDEXYZFG’ B) Deletion :- Deleting a string from the text. Suppose in a given text T we want to delete the substring which begins in position K & has length L. DELETE (Text, Position, Length) Ex.:- DELETE (‘ABCDEFG’,4,2)= ‘ABCFG’ DELETE (‘ABCDEFG’, 2, 4)= ‘AFG’ DELETE (‘ABCDEFG’, 0, 2)= ‘ABCDEFG’ C)Replacement :- Replacing one string in the text by another. Suppose in a given text T we want to replace the first occurrence of pattern P1 by pattern p2 REPLACE (Text ,Pattern 1,Pattern2) Ex.:- REPLACE(‘XABYABZ’,’AB’, ‘C’)=’XCYABZ’ REPLACE(‘XABYABZ’,’BA’, ‘C’)=’XABYABZ’ In the second case the pattern BA does not occur ,& hence there is no change. Algorithm 3.1 :- A text T & a pattern P are in memory .This algorithm delete every occurrence of P in T. 1.[Find index of P.] set K:= INDEX (T,P). 2. Repeat while K ≠ 0 (a) [Delete P from T] Set T:= DELETE (T,INDEX(T,P), LENGTH (P)). (b) [Update index] Set K:= INDEX(T,P). [End of Loop.] 3. Write :T. 4. Exit. Algorithm 3.2 :- A text T & patterns P & Q are in memory. This algorithm replaces Every occurrence of P in T by Q. 1. [Find index of P] Set K:= INDEX(T,P). 2. Repeat while K ≠ 0 (a) [Replace P by Q ] Set T:= REPLACE (T,P,Q). (a) [Update index] Set K:= INDEX(T,P). [End of Loop.] 3. Write :T. 4. Exit. Pattern Matching algorithm :There are two algorithms given in the book. 1) First pattern matching algorithm :In this algorithm we compare a given pattern P with each of the substring of T, moving from left to right ,until we get a match. Suppose that P is a 4 characters string & T is 20 character string And P & T appear in memory as linear arrays with one character per element. i.e. p=p[1] p[2] p[3] p[4] T=T[1] T[2] T[3]……………. T[19] T[20] Then P is compared with each of the following 4 characters substring of T. W1=T[1] T[2] T[3] T[4] ,W2=T[2] T[3] T[4] T[5]…….,W17=T[17] T[18] T[19] T[20] NOTE:- Max =20-4+1=17 (Number of Substring of T are formed) Max = LENGTH(T) – LENGTH(P)+1 This algorithm contains two loops ,one inside the other .The outer loop runs through each successive R-character substring of T. The inner loop compares P with W K , character by character .If any character does not match , control transfer to next substring to match. If P does not appear in T then INDEX=0. Algorithm 3.3: ( Pattern Matching ) P & T are strings with length R & S respectively & are stored as arrays with one character per element .This algorithm finds the INDEX of P in T. 1. [Initialize] set K:=1 & Max:=S-R+1. 2. Repeat steps 3 to 5 while K ≤ Max 3. Repeat for L=1 to R [Tests each character of p] If P[L] ≠ T [K+L-1] ,Then : Go to step 5. [End of Inner Loop] 4. [Success] set INDEX=K & Exit.. 5. Set K:=K+1. [End of step 2 outer loop] 6. [Failure] Set INDEX=0. 7. Exit.. Complexity of first matching Algorithm:It is measured by the number C of comparisons between characters in the pattern P & character of the text T. Then C=N1 +N2 +……+N L Where, L is the position in T where P first appears or L= Max if P does not appear in T. Ex.1 :- Suppose P= a a b a T= c d c d c d c d c d c d……..=(c d)10 W1 W2 W3 Max=20-4+1=17 Hence C= 1+1+……..+1=17 2:- Suppose P=a a b a T= a b a b a a b a W1 W1 =a b a b W2 =b a b a W3 = a b a a W4 =b a a b W5 = a a b a Hence C= 2 + 1+ 2+ 1+ 4 = 10 3:- Suppose P= a a a b T=a a …..=a 20 Hence , C= 4 + 4 + ……….+4 = 68 In general , P is an r character string T is an s character string Data size for algorithm is N=r + s Max =20-4 +1 W1 =a a a a W2 = a a a a W17 = a a a a In Worst case every character of p except the last matches every substring W k.In this case as we know Max= s – r +1 C(n) =r(s-r+1) [ See ex. 3 ] For fixed n we have s=n-r The maximum value of c(n) occurs when r=(n+1) / 4 Accordingly , C(n) = (n+1)2 / 8 = O (n2) The complexity of this pattern matching algorithm is equal to O (n2) 2) Second pattern matching algorithm:Algorithm 3.4: (Pattern matching ) The pattern matching algorithm Table F(Q1,T) of a pattern P is in memory ,& the input is an N character string T= T1 T2 …… TN .This algorithm finds the INDEX of P in T. 1. [Initialize] set K:=1 & S1=Q0 . 2. Repeat steps 3 to 5 while SK ≠ P & K ≤ N 3. Read Tk . 4. Set Sk+1:= F(Sk , Tk). [Find next stage] 5. Set K:= K+1 .[Updates counter] [End of step 2 Loop] 6. [Successful ?] If Sk= P , Then : INDEX =K – LENGTH (P). Else : INDEX=0. [End of If Structure] 7. Exit. Complexity of second pattern matching Algorithm:The running time of the above algorithm is proportional to the number of times the step 2 loop is executed .The worst case occurs when all the text T is read i.e. loop is executed n=LENGTH(T) times. Accordingly we can state that the complexity of this pattern matching algorithm is equal to O(n). The second pattern matching algorithm is fast as a compare to first matching algorithm. Problems Ex.1:- Let P = a a b a The initial substrings of P are Q0= , Q1 = a, Q2 = a2, Q3= a2 b , Q4 = a2 ba=p [Q0= is the empty string ] For each character t, the entry ƒ (Qi , t ) in the table is the largest Q which appears as a terminal substring in the string Qi t. We compute ƒ(, a) =a ƒ(, b) = i.e. ƒ(Q0, a) =Q1 i.e. ƒ(Q0, b) = Q0 ƒ(a, a) = a2 i.e. ƒ(Q1, a) =Q2 ƒ(a, b) = i.e. ƒ(Q1, b) = Q0 ƒ(a2, a) = a2 i.e. ƒ(Q2, a) =Q2 ƒ(a2, b) = a2 b i.e. ƒ(Q2, b) = Q 3 ƒ(a2b , b) = i.e. ƒ(Q3, b) = Q0 ƒ(a2 b , a) =P i.e. ƒ(Q3, a) =Q4=P Pattern matching table :Q Q0 Q1 Q2 Q3 a Q1 Q2 Q2 P b Q0 Q0 Q3 Q0 Pattern matching graph :- a a Q0 b Q1 b b Q2 a b a Q3 P Ex. 2 :- Consider the pattern P = a a a b b . First list the initial segments of P : Q0= , Q1= a, Q2= a2, Q3= a3 , Q4= a3 b , Q5= a3 b2 = P For each character t, the entry ƒ (Qi , t ) in the table is the largest Q which appears as a terminal substring in the string Qi t. We compute ƒ(, a) =a = Q1 ƒ(, b) = = Q0 ƒ(a, a) = a2 = Q2 ƒ(a, b) = = Q0 ƒ(a2, a) = a3 = Q3 ƒ(a2, b) = = = Q0 ƒ(a3 , a) = a3 = Q3 ƒ(a3 , b) = a3 b = Q4 ƒ(a3 b, a) = a = Q1 ƒ(a3 b, b) = a2 b2 = P Hence we obtained the following pattern matching table : Q Q0 Q1 Q2 Q3 Q4 a Q1 Q2 Q3 Q3 Q1 b Q0 Q0 Q0 Q4 P Pattern matching graph for above table is as follows:a a a Q0 b Q1 b b b a Q2 a Q3 b b Q4 P Ex. 3 :- Let pattern P = a b a b a b . The initial substrings of P are : Q0= , Q1= a, Q2= ab, Q3= aba , Q6= ababab=p Q4= ab ab , Q5= ababa , The function ƒ given the entries in the table as follow: ƒ(, a) =a = Q1 ƒ(, b) = = Q0 ƒ(a, a) = a= Q1 ƒ(a, b) =ab= Q2 ƒ(ab, a) = aba = Q3 ƒ(ab, b) = = Q0 ƒ(aba, a) = a = Q1 ƒ(aba, b) = abab= Q4 ƒ(abab, a) = ababa = Q5 ƒ(abab, b) = = Q0 ƒ(ababa, a) = a= Q1 ƒ(ababa, b) = ababab= P Hence we obtained the following pattern matching table : Q Q0 Q1 Q2 Q3 Q4 Q5 a Q1 Q1 Q3 Q1 Q5 Q1 b Q0 Q2 Q0 Q4 Q0 P Pattern matching graph for above table is as follows:a a a a Q0 b Q1 a Q2 b Q3 Q4 b b a b b Q5 P
© Copyright 2024