A NOVEL ALGORITHM DUFP for MINING ASSOCIATION RULES in

IJournals: International Journal of Software & Hardware Research in Engineering
ISSN-2347-4890
Volume 3 Issue 4 April, 2015
A NOVEL ALGORITHM DUFP for MINING
ASSOCIATION RULES in DYNAMIC
DATABASES
Author: Shahana Sheikh1, Mr. Rupesh Patidar2
PG Scholar, CSE Department, Jawaharlal Institute of Technology, Borawan (M.P.)1,
Asst. Prof. of CSE Department, Jawaharlal Institute of Technology, Borawan (M.P.)2
E-mail: [email protected], [email protected]
ABSTRACT
This dissertation proposed a new Algorithm to
generate incremental frequent pattern. DUFP not only
work for incremental database but it will also work for
decremented database. DUFP algorithm use vertical
data format approach. DUFP algorithm give better
performance by reducing database scans. DUFP also
reduce computational complexity and computational
time to generate frequent pattern. There are several
approaches to solve the incremental mining problem
like re-run the mining algorithm on the updated
database. But two important factor that is to be
consider efficiency and the effectiveness of algorithm.
readable form and can be understood by a user.
Predictive provides predictions of future events.
Association rule:
Association rule was first introduced by Agarwal [4].
Association Rules: association rule are the statements
that find the relationship between data in any
database. Association rule has two parts „Antecedent‟
and „Consequent‟. For example {egg } => {milk}. Here
egg is the antecedent and milk is the consequent.
Keywords
Association Rule Mining, Data Mining, Incremental
Mining, Frequent Itemset, Minimum Support,
Minimum Confidence.
1. INTRODUCTION
Data mining is the principle of sorting
through large amounts of data and picking out
relevant information. It is usually used by business
intelligence organizations, and financial analysts, but
it is increasingly used in the sciences to extract
information from the enormous data sets generated
by modern experimental and observational methods.
Data mining, popularly known as Knowledge
Discovery in Databases (KDD), it is the nontrivial
extraction of implicit, previously unknown and
potentially useful information from data in databases
[2, 3]. It is actually the process of finding the hidden
information/pattern of the repositories [1,2,3]. Data
mining is used to apply to the two separate processes
of knowledge discovery and prediction. Knowledge
discovery provides explicit information that has a
© 2015, IJournals All Rights Reserved
Figure 1: Generating Association rules
Support
It is an indication of item how frequently it occurs in
database. For a rule A=> B, its support is the
percentage of transaction in database that contain
AUB (means both A and B) [5].
Confidence
It indicates the no of times the statements found to be
true. Confidence of the rule given above is the
percentage of transaction in database containing A
that also contain B [5].
www.ijournals.in
Page 25
IJournals: International Journal of Software & Hardware Research in Engineering
ISSN-2347-4890
Volume 3 Issue 4 April, 2015
2. INCREMENTAL FREQUENT
PATTERN MINING
The mining of a frequent pattern on transactional
database is usually an offline process since it is costly
to find frequent pattern in large databases. In marketbasket applications, new transactions are generated
and old transactions may be obsolete as time
advances. As a result, incremental updating
techniques should be developed for maintenance of
the discovered frequent itemset from the updated
database. Discovering the frequent item sets becomes
more time consuming if the dataset is incremental in
nature. In the incremental dataset, new records are
added time to time and also existing transactions are
deleted time to time from the database.
Figure 2: An illustrative transaction database
There are several important characteristics in the
updated database:
1.
2.
3.
4.
The update problem can be reduced to
finding the new set of frequent item sets.
After that, the new association rules can be
computed from the new frequent item sets.
An old frequent item set has the potential to
become infrequent in the updated database.
Similarly, an old infrequent item set could
become frequent in the new database.
In order to find the new frequent item sets
"exactly", all the records in the updated
database, including those from the original
database, have to be checked against every
candidate set.
3. EXISTING WORK
There are several approaches has been developed to
solve this problem. Some of them are 1. FUP(Fast and UPdate)
2. MAAP(Maintaining Association rule using
Apriori Property)
3. Border Set Algorithm
4. FUP2(Fast and Update2)
Figure 3: Generation of candidate item sets and
frequent item sets using Apriori Algorithm
FUP uses apriori concepts to mining frequent pattern
form incremental as well as from the old database.
This methods will works well when the added
transaction are small in size but when the added
transaction are very large this method is not efficient
due to candidate generation in old and new database,
adding or subtracting, then comparing their support
count with the given support count is very time
consuming process .Second important problem with
this method is that it works only for incremented
database not for decremented database.
3.1 FUP Algorithm
Cheung et al. proposed the FUP algorithm to
incrementally maintain association rules when new
transactions are inserted [6, 7]. FUP (Fast and
Update) is the first algorithm proposed to solve the
problem of incremental mining of association rules. It
handles databases with transaction insertion only,
but is not able to deal with transaction deletion.
Taking sample example of transactional database
with minimum support 40%.
© 2015, IJournals All Rights Reserved
Now added three transactions T10, T11 and T12 in
this database
www.ijournals.in
Page 26
IJournals: International Journal of Software & Hardware Research in Engineering
ISSN-2347-4890
Volume 3 Issue 4 April, 2015
Figure 4: An illustrative database for performing
algorithm FUP and the original frequent itemsets
generated using association rule mining Algorithm(s)
Figure 5: The first iteration of algorithm FUP on the
dataset
.
Figure7: Generation of new frequent item sets using
Algorithm FUP2
Now from the updated database frequent item are
different from the old database.
3.3 Border set Algorithm
Figure 6: The second iteration of Algorithm FUP on
the dataset
Frequent item sets
are{{A},{B},{C},{D},{A,B},{A,C},{B,C},{A,B,C} }. Clearly
that one new item D are now also frequent which is
not frequent in old database. Similarly E is deleted
The Borders algorithm [8] is the first dynamic
algorithm with the capability to handle all cases of:
transaction addition, transaction addition and
deletion and change in minimum support threshold.
While incremental algorithms [6, 7] prior to border
addressed transactions incremental and deletion,
they never considered situations where there is need
to change the minimum support. Borders algorithm
uses the concept of border sets [9].
which are frequent in old database.
3.2 Fup2 Algorithm
To overcome the problem of FUP [6] a new extension
FUP2 [7] has been developed. FUP2 not only works
for incremented database but it will also works for
decremented database .Consider a simple example
there are nine transaction in the transaction database
. Now first three transaction(T1,T2,T3) are deleted
and three more transaction(T10,T11,T12) are added
.Now in the update Database there are nine
transaction( T4,T5…..T12). Now to generate frequent
Figure 8: Border Set Example
item set we apply FUP2 method.
© 2015, IJournals All Rights Reserved
www.ijournals.in
Page 27
IJournals: International Journal of Software & Hardware Research in Engineering
ISSN-2347-4890
Volume 3 Issue 4 April, 2015
An itemset A is considered a border set, if A is not
frequent but all its proper subsets are frequent. When
the database is updated, the new frequent item sets
are expected to occur only if some border item sets
have reached the minimum support threshold and
hence becomes frequent item sets called promoted
border sets. Borders algorithm further maintains that
scanning the old database is only performed if some
border sets have reached minimum support and
become promoted border sets. Like in apriori
algorithm , same apriori concepts is employed in the
borders algorithm to generate candidates, Several
scan to the old database when new candidates sets
arise in addition to the cost of updating and
maintaining the border set. However performance
comparisons [8] reveal that border algorithm
performs better than FUP.
4. Problem Statement
Fig.10 updated transaction database
Updated database has nine transitions. Now the
transactional database D’ converted into vertical
database.
There are two main problems in the FUP and FUP2:
1.
2.
Candidate generation is time consuming
process because first we generate candidate
item set then prune if
they do not satisfy
the minimum support value.
FUP and FUP2 will work well if the
incremented transactions are small in size
but if they are large in size the performance
of FUP and FUP2 degraded
5. Proposed Method for DUFP
DUFP uses vertical data format. Database is converted
into vertical format and then simple intersection
methods are applied to generate frequent pattern.
Consider previous example
Figure 9: An old transaction database
Now assign a code to each row .For example item set
A has code R1.Use S for select for the next step and D
for delete. Deleted item will not be considered in the
next step.
© 2015, IJournals All Rights Reserved
www.ijournals.in
Page 28
IJournals: International Journal of Software & Hardware Research in Engineering
ISSN-2347-4890
Volume 3 Issue 4 April, 2015
While (Lk-1 ≠ Φ) do
Begin
Ck: = gen_candidate_itemsets with the given Lk-1
Prune (Ck)
There are only four item remains in the table. E and F
two items are deleted because there support count
not satisfies the minimum support value.
For all candidates in Ck do Count the number of
transactions that are common in each item Є Ck
Lk: = All candidates in Ck with minimum Support;
K: = K + 1;
End
Now perform intersection between Row ID to
generate two candidate item set .Delete those item
sets which does not satisfy minimum support count
value.
There are only two item set {A, B} and {B, D}
satisfying the minimum support count value.
Now for three item set ABD
No three item set is present in updated database.
From the above methods that can find the same
frequent item set which is generated by the FUP and
FUP2.
Frequent one item set {A}, {B}, {C}, and {D}.
Frequent two item set {A B}, {B D}.
DUFP Algorithm
Initialize: K: = 1, C1 = all the 1- item sets;
Read the database to count the support of C1 to
Determine L1.
L1:= {frequent 1- item sets};
K: =2; //k represents the pass number//
© 2015, IJournals All Rights Reserved
Figure11: Flow chart of DUFP algorithm
Flowchart represents step by step working of the
algorithm. Flowchart of DUPF is represent in first
step, that convert the database into vertical format
then in the second step find one candidate item set C 1,
then from all candidates we find one frequent item set
L1 .Take only those item in L1 which satisfy the
minimum support count value and remaining items
are deleted this is done in step 3. Form L1 to generate
candidate set C2 for two item set. From C2 to L2 … this
process will repeated until all frequent item sets are
generated or no more candidate generation is
possible.
6. Experimental Results
Comparison between FUP and DUFP on the basis
of Minimum support:
Comparison between FUP and DUFP algorithms on
the basis of minimum support when database have
1,000 transactions. Figure 12 Show the comparison
between FUP and DUFP algorithm
www.ijournals.in
Page 29
IJournals: International Journal of Software & Hardware Research in Engineering
ISSN-2347-4890
Volume 3 Issue 4 April, 2015
Average Time
taken to execute
Minimum
support
Average Time
taken to
execute
60
203*10-3
156* 10-3
70
115*10-3
109* 10-3
(In milliseconds)
(In seconds)
FUP Algorithm
Figure 13: Execution time taken by FUP2 and DUFP
DUFP
Algorithm
50
370* 10-3
254* 10-3
60
208*10-3
156* 10-3
70
117*10-3
109* 10-3
Figure 12: Execution time taken by FUP and DUFP
Figure 13.1: Comparisons Between FUP2 andDUFP
From the Figure 13.1 is clear that when support
counts will increases, the execution time will
decreases because less number of items will satisfy
the condition of minimum support. In First stage
maximum item will be deleted which is not satisfy the
condition of minimum support then in next step less
item to scan. It is clear that DUFP takes less time as
Figure 12.1 :Comparisons Between FUP andDUFP
From the Figure 12.1 is clear that when support count
increases then execution time will decrease because
less number of items that will satisfy the condition of
minimum support. In the first stage maximum item
will be deleted that not satisfy the condition of
minimum support then in the next step less item scan.
It is clear that FUP takes more time as compared to
DUFP
Comparison between FUP2 and DUFP on the basis
of Minimum support:
Comparison between FUP2 and DUFP Algorithms on
the basis of minimum support, when database has
1,000 transactions. Table 13 Show the comparison
between FUP2 and DUFP Algorithm.
compared to FUP2. So DUFP is more efficient in term
of execution time.
Comparison between FUP and DUFP on the basis
of Number of Records Added:
Average Time taken to
execute
Average Time taken to
execute
Number of record
add in database in
percentage
(In Milliseconds)
(In Milliseconds)
FUP Algorithm
DUFP Algorithm
100
118* 10-3
112* 10-3
200
129*10-3
125* 10-3
300
130*10-3
128* 10-3
Figure 14: Execution time taken by FUP and DUFP
Minimum
support
50
Average Time
taken to
execute
Average
Time taken
to execute
(In
Milliseconds)
(In
Milliseconds)
FUP2
Algorithm
DUFP
Algorithm
371* 10-3
254* 10-3
© 2015, IJournals All Rights Reserved
Comparison between FUP and DUFP Algorithm on the
basis of number of records added to the database
when minimum support is 70% and total transaction
is 1000. Table 6.5 Show the comparison between FUP
and DUFP Algorithm.
www.ijournals.in
Page 30
IJournals: International Journal of Software & Hardware Research in Engineering
ISSN-2347-4890
Volume 3 Issue 4 April, 2015
Databases,” AI Magazine, American Association for
Artificial Intelligence, 1996.
[4] Rakesh Agrawal, T. Imieliński, A. Swami, "Mining
association rules between sets of items in large
databases". In:Proceedings of the 1993 ACM SIGMOD
international conference on Management of data SIGMOD '93, 1993pp. 207-216.
Fig 14.1 Comparison between FUP and DUFP using
percentage of Records added to the Database
Figure 14.1 present scalability of the algorithms.
Records are added in terms of percentage of the
existing records. Records are added like 100, 200
…etc. In the Figure 14.1 is clear that DUFP more
efficient as compared the FUP.
7. Conclusion
From the experimental data presented it can be
concluded that DUFP based approach achieves better
performance by reducing
reducing
the
database scans hence
computational
time
thus
DUFP
algorithm behaves better than the FUP and FUP2
algorithms. FUP and FUP2 required more number of
database scan because they are basically use Apriori
based concepts to generate candidate sets then after
pruning get frequent pattern. In other case FUP work
only for addition but DUFP works for addition and
[5] Jogi.Suresh, T.Ramanjaneyulu, “Mining Frequent
Itemsets Using Apriori Algorithm”, In: Proceeding of
International Journal of Computer Trends and
Technology, ISSN 2231-2803, Vol. 4, Issue 4, April
2013.
[6]. Cheung D W, Han J, Ng V T and Wong C Y (1996)
“Maintenance of Discovered Association Rules in
Large Databases: An Incremental Updating
Technique”, 12th International Conference on Data
Engineering, New Orleans, Louisiana.
[7]. Cheung D W, Lee D W, and Kao S D (1997). “A
General Incremental Technique for Maintaining
Discovered Association Rules”, In Proceedings of the
fifth International Conference on Database System for
Advanced Applications, Melbourne, Australia.
[8]. Feldman R, Aumann Y and Lipshtat O (1999)
“Borders: An Efficient Algorithm for Association
Generation in Dynamic Databases”, Journal of
Intelligent Information System, pp; 61–73.
[9]. Mannila H. and Toivonen H (1997). Level wise
Search and Borders of Theories in Knowledge
Discovery, Data Mining and Knowledge Discovery, pp.
241–258
deletion.
8. Future Work
The other direction is to enhance this application for
multidimensional database and the application will
also able to generate multilevel frequent pattern. One
can also enhance this application to find out utility
item in the transactions so that can understand the
high and low utility items for business planning.
9. References
[1] Introduction to Data Mining and Knowledge
Discovery, Third Edition ISBN: 1-892095-02-5, Two
Crows Corporation, 10500 Falls Road, Potomac, MD
20854 (U.S.A.), 1999.
[2] Dunham, M. H., Sridhar S., “Data Mining:
Introductory and Advanced Topics”, Pearson
Education,New Delhi, ISBN: 81-7758-785-4, 1st
Edition, 2006 .
[3] Fayyad, U., Piatetsky-Shapiro, G., and Smyth P.,
“From Data Mining to Knowledge Discovery in
© 2015, IJournals All Rights Reserved
www.ijournals.in
Page 31