IJournals: International Journal of Software & Hardware Research in Engineering ISSN-2347-4890 Volume 3 Issue 4 April, 2015 A NOVEL ALGORITHM DUFP for MINING ASSOCIATION RULES in DYNAMIC DATABASES Author: Shahana Sheikh1, Mr. Rupesh Patidar2 PG Scholar, CSE Department, Jawaharlal Institute of Technology, Borawan (M.P.)1, Asst. Prof. of CSE Department, Jawaharlal Institute of Technology, Borawan (M.P.)2 E-mail: [email protected], [email protected] ABSTRACT This dissertation proposed a new Algorithm to generate incremental frequent pattern. DUFP not only work for incremental database but it will also work for decremented database. DUFP algorithm use vertical data format approach. DUFP algorithm give better performance by reducing database scans. DUFP also reduce computational complexity and computational time to generate frequent pattern. There are several approaches to solve the incremental mining problem like re-run the mining algorithm on the updated database. But two important factor that is to be consider efficiency and the effectiveness of algorithm. readable form and can be understood by a user. Predictive provides predictions of future events. Association rule: Association rule was first introduced by Agarwal [4]. Association Rules: association rule are the statements that find the relationship between data in any database. Association rule has two parts „Antecedent‟ and „Consequent‟. For example {egg } => {milk}. Here egg is the antecedent and milk is the consequent. Keywords Association Rule Mining, Data Mining, Incremental Mining, Frequent Itemset, Minimum Support, Minimum Confidence. 1. INTRODUCTION Data mining is the principle of sorting through large amounts of data and picking out relevant information. It is usually used by business intelligence organizations, and financial analysts, but it is increasingly used in the sciences to extract information from the enormous data sets generated by modern experimental and observational methods. Data mining, popularly known as Knowledge Discovery in Databases (KDD), it is the nontrivial extraction of implicit, previously unknown and potentially useful information from data in databases [2, 3]. It is actually the process of finding the hidden information/pattern of the repositories [1,2,3]. Data mining is used to apply to the two separate processes of knowledge discovery and prediction. Knowledge discovery provides explicit information that has a © 2015, IJournals All Rights Reserved Figure 1: Generating Association rules Support It is an indication of item how frequently it occurs in database. For a rule A=> B, its support is the percentage of transaction in database that contain AUB (means both A and B) [5]. Confidence It indicates the no of times the statements found to be true. Confidence of the rule given above is the percentage of transaction in database containing A that also contain B [5]. www.ijournals.in Page 25 IJournals: International Journal of Software & Hardware Research in Engineering ISSN-2347-4890 Volume 3 Issue 4 April, 2015 2. INCREMENTAL FREQUENT PATTERN MINING The mining of a frequent pattern on transactional database is usually an offline process since it is costly to find frequent pattern in large databases. In marketbasket applications, new transactions are generated and old transactions may be obsolete as time advances. As a result, incremental updating techniques should be developed for maintenance of the discovered frequent itemset from the updated database. Discovering the frequent item sets becomes more time consuming if the dataset is incremental in nature. In the incremental dataset, new records are added time to time and also existing transactions are deleted time to time from the database. Figure 2: An illustrative transaction database There are several important characteristics in the updated database: 1. 2. 3. 4. The update problem can be reduced to finding the new set of frequent item sets. After that, the new association rules can be computed from the new frequent item sets. An old frequent item set has the potential to become infrequent in the updated database. Similarly, an old infrequent item set could become frequent in the new database. In order to find the new frequent item sets "exactly", all the records in the updated database, including those from the original database, have to be checked against every candidate set. 3. EXISTING WORK There are several approaches has been developed to solve this problem. Some of them are 1. FUP(Fast and UPdate) 2. MAAP(Maintaining Association rule using Apriori Property) 3. Border Set Algorithm 4. FUP2(Fast and Update2) Figure 3: Generation of candidate item sets and frequent item sets using Apriori Algorithm FUP uses apriori concepts to mining frequent pattern form incremental as well as from the old database. This methods will works well when the added transaction are small in size but when the added transaction are very large this method is not efficient due to candidate generation in old and new database, adding or subtracting, then comparing their support count with the given support count is very time consuming process .Second important problem with this method is that it works only for incremented database not for decremented database. 3.1 FUP Algorithm Cheung et al. proposed the FUP algorithm to incrementally maintain association rules when new transactions are inserted [6, 7]. FUP (Fast and Update) is the first algorithm proposed to solve the problem of incremental mining of association rules. It handles databases with transaction insertion only, but is not able to deal with transaction deletion. Taking sample example of transactional database with minimum support 40%. © 2015, IJournals All Rights Reserved Now added three transactions T10, T11 and T12 in this database www.ijournals.in Page 26 IJournals: International Journal of Software & Hardware Research in Engineering ISSN-2347-4890 Volume 3 Issue 4 April, 2015 Figure 4: An illustrative database for performing algorithm FUP and the original frequent itemsets generated using association rule mining Algorithm(s) Figure 5: The first iteration of algorithm FUP on the dataset . Figure7: Generation of new frequent item sets using Algorithm FUP2 Now from the updated database frequent item are different from the old database. 3.3 Border set Algorithm Figure 6: The second iteration of Algorithm FUP on the dataset Frequent item sets are{{A},{B},{C},{D},{A,B},{A,C},{B,C},{A,B,C} }. Clearly that one new item D are now also frequent which is not frequent in old database. Similarly E is deleted The Borders algorithm [8] is the first dynamic algorithm with the capability to handle all cases of: transaction addition, transaction addition and deletion and change in minimum support threshold. While incremental algorithms [6, 7] prior to border addressed transactions incremental and deletion, they never considered situations where there is need to change the minimum support. Borders algorithm uses the concept of border sets [9]. which are frequent in old database. 3.2 Fup2 Algorithm To overcome the problem of FUP [6] a new extension FUP2 [7] has been developed. FUP2 not only works for incremented database but it will also works for decremented database .Consider a simple example there are nine transaction in the transaction database . Now first three transaction(T1,T2,T3) are deleted and three more transaction(T10,T11,T12) are added .Now in the update Database there are nine transaction( T4,T5…..T12). Now to generate frequent Figure 8: Border Set Example item set we apply FUP2 method. © 2015, IJournals All Rights Reserved www.ijournals.in Page 27 IJournals: International Journal of Software & Hardware Research in Engineering ISSN-2347-4890 Volume 3 Issue 4 April, 2015 An itemset A is considered a border set, if A is not frequent but all its proper subsets are frequent. When the database is updated, the new frequent item sets are expected to occur only if some border item sets have reached the minimum support threshold and hence becomes frequent item sets called promoted border sets. Borders algorithm further maintains that scanning the old database is only performed if some border sets have reached minimum support and become promoted border sets. Like in apriori algorithm , same apriori concepts is employed in the borders algorithm to generate candidates, Several scan to the old database when new candidates sets arise in addition to the cost of updating and maintaining the border set. However performance comparisons [8] reveal that border algorithm performs better than FUP. 4. Problem Statement Fig.10 updated transaction database Updated database has nine transitions. Now the transactional database D’ converted into vertical database. There are two main problems in the FUP and FUP2: 1. 2. Candidate generation is time consuming process because first we generate candidate item set then prune if they do not satisfy the minimum support value. FUP and FUP2 will work well if the incremented transactions are small in size but if they are large in size the performance of FUP and FUP2 degraded 5. Proposed Method for DUFP DUFP uses vertical data format. Database is converted into vertical format and then simple intersection methods are applied to generate frequent pattern. Consider previous example Figure 9: An old transaction database Now assign a code to each row .For example item set A has code R1.Use S for select for the next step and D for delete. Deleted item will not be considered in the next step. © 2015, IJournals All Rights Reserved www.ijournals.in Page 28 IJournals: International Journal of Software & Hardware Research in Engineering ISSN-2347-4890 Volume 3 Issue 4 April, 2015 While (Lk-1 ≠ Φ) do Begin Ck: = gen_candidate_itemsets with the given Lk-1 Prune (Ck) There are only four item remains in the table. E and F two items are deleted because there support count not satisfies the minimum support value. For all candidates in Ck do Count the number of transactions that are common in each item Є Ck Lk: = All candidates in Ck with minimum Support; K: = K + 1; End Now perform intersection between Row ID to generate two candidate item set .Delete those item sets which does not satisfy minimum support count value. There are only two item set {A, B} and {B, D} satisfying the minimum support count value. Now for three item set ABD No three item set is present in updated database. From the above methods that can find the same frequent item set which is generated by the FUP and FUP2. Frequent one item set {A}, {B}, {C}, and {D}. Frequent two item set {A B}, {B D}. DUFP Algorithm Initialize: K: = 1, C1 = all the 1- item sets; Read the database to count the support of C1 to Determine L1. L1:= {frequent 1- item sets}; K: =2; //k represents the pass number// © 2015, IJournals All Rights Reserved Figure11: Flow chart of DUFP algorithm Flowchart represents step by step working of the algorithm. Flowchart of DUPF is represent in first step, that convert the database into vertical format then in the second step find one candidate item set C 1, then from all candidates we find one frequent item set L1 .Take only those item in L1 which satisfy the minimum support count value and remaining items are deleted this is done in step 3. Form L1 to generate candidate set C2 for two item set. From C2 to L2 … this process will repeated until all frequent item sets are generated or no more candidate generation is possible. 6. Experimental Results Comparison between FUP and DUFP on the basis of Minimum support: Comparison between FUP and DUFP algorithms on the basis of minimum support when database have 1,000 transactions. Figure 12 Show the comparison between FUP and DUFP algorithm www.ijournals.in Page 29 IJournals: International Journal of Software & Hardware Research in Engineering ISSN-2347-4890 Volume 3 Issue 4 April, 2015 Average Time taken to execute Minimum support Average Time taken to execute 60 203*10-3 156* 10-3 70 115*10-3 109* 10-3 (In milliseconds) (In seconds) FUP Algorithm Figure 13: Execution time taken by FUP2 and DUFP DUFP Algorithm 50 370* 10-3 254* 10-3 60 208*10-3 156* 10-3 70 117*10-3 109* 10-3 Figure 12: Execution time taken by FUP and DUFP Figure 13.1: Comparisons Between FUP2 andDUFP From the Figure 13.1 is clear that when support counts will increases, the execution time will decreases because less number of items will satisfy the condition of minimum support. In First stage maximum item will be deleted which is not satisfy the condition of minimum support then in next step less item to scan. It is clear that DUFP takes less time as Figure 12.1 :Comparisons Between FUP andDUFP From the Figure 12.1 is clear that when support count increases then execution time will decrease because less number of items that will satisfy the condition of minimum support. In the first stage maximum item will be deleted that not satisfy the condition of minimum support then in the next step less item scan. It is clear that FUP takes more time as compared to DUFP Comparison between FUP2 and DUFP on the basis of Minimum support: Comparison between FUP2 and DUFP Algorithms on the basis of minimum support, when database has 1,000 transactions. Table 13 Show the comparison between FUP2 and DUFP Algorithm. compared to FUP2. So DUFP is more efficient in term of execution time. Comparison between FUP and DUFP on the basis of Number of Records Added: Average Time taken to execute Average Time taken to execute Number of record add in database in percentage (In Milliseconds) (In Milliseconds) FUP Algorithm DUFP Algorithm 100 118* 10-3 112* 10-3 200 129*10-3 125* 10-3 300 130*10-3 128* 10-3 Figure 14: Execution time taken by FUP and DUFP Minimum support 50 Average Time taken to execute Average Time taken to execute (In Milliseconds) (In Milliseconds) FUP2 Algorithm DUFP Algorithm 371* 10-3 254* 10-3 © 2015, IJournals All Rights Reserved Comparison between FUP and DUFP Algorithm on the basis of number of records added to the database when minimum support is 70% and total transaction is 1000. Table 6.5 Show the comparison between FUP and DUFP Algorithm. www.ijournals.in Page 30 IJournals: International Journal of Software & Hardware Research in Engineering ISSN-2347-4890 Volume 3 Issue 4 April, 2015 Databases,” AI Magazine, American Association for Artificial Intelligence, 1996. [4] Rakesh Agrawal, T. Imieliński, A. Swami, "Mining association rules between sets of items in large databases". In:Proceedings of the 1993 ACM SIGMOD international conference on Management of data SIGMOD '93, 1993pp. 207-216. Fig 14.1 Comparison between FUP and DUFP using percentage of Records added to the Database Figure 14.1 present scalability of the algorithms. Records are added in terms of percentage of the existing records. Records are added like 100, 200 …etc. In the Figure 14.1 is clear that DUFP more efficient as compared the FUP. 7. Conclusion From the experimental data presented it can be concluded that DUFP based approach achieves better performance by reducing reducing the database scans hence computational time thus DUFP algorithm behaves better than the FUP and FUP2 algorithms. FUP and FUP2 required more number of database scan because they are basically use Apriori based concepts to generate candidate sets then after pruning get frequent pattern. In other case FUP work only for addition but DUFP works for addition and [5] Jogi.Suresh, T.Ramanjaneyulu, “Mining Frequent Itemsets Using Apriori Algorithm”, In: Proceeding of International Journal of Computer Trends and Technology, ISSN 2231-2803, Vol. 4, Issue 4, April 2013. [6]. Cheung D W, Han J, Ng V T and Wong C Y (1996) “Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique”, 12th International Conference on Data Engineering, New Orleans, Louisiana. [7]. Cheung D W, Lee D W, and Kao S D (1997). “A General Incremental Technique for Maintaining Discovered Association Rules”, In Proceedings of the fifth International Conference on Database System for Advanced Applications, Melbourne, Australia. [8]. Feldman R, Aumann Y and Lipshtat O (1999) “Borders: An Efficient Algorithm for Association Generation in Dynamic Databases”, Journal of Intelligent Information System, pp; 61–73. [9]. Mannila H. and Toivonen H (1997). Level wise Search and Borders of Theories in Knowledge Discovery, Data Mining and Knowledge Discovery, pp. 241–258 deletion. 8. Future Work The other direction is to enhance this application for multidimensional database and the application will also able to generate multilevel frequent pattern. One can also enhance this application to find out utility item in the transactions so that can understand the high and low utility items for business planning. 9. References [1] Introduction to Data Mining and Knowledge Discovery, Third Edition ISBN: 1-892095-02-5, Two Crows Corporation, 10500 Falls Road, Potomac, MD 20854 (U.S.A.), 1999. [2] Dunham, M. H., Sridhar S., “Data Mining: Introductory and Advanced Topics”, Pearson Education,New Delhi, ISBN: 81-7758-785-4, 1st Edition, 2006 . [3] Fayyad, U., Piatetsky-Shapiro, G., and Smyth P., “From Data Mining to Knowledge Discovery in © 2015, IJournals All Rights Reserved www.ijournals.in Page 31
© Copyright 2024