Digital Symposium Collection 2000  

 
 
 
 
 
 

 





















Optimization of Constrained Frequent Set Queries with 2-variable Constraints

Laks V. S. Lakshmanan, Raymond T. Ng, Jiawei Han, and Alex Pang

  View Paper (PDF)  

Return to Data Mining - Association Rules and Decision Trees

Abstract
Currently, there is tremendous interest in providing ad-hoc mining capabilities in database management systems. As a first step towards this goal, in [15] we proposed an architecture for supporting constraint-based, human-centered, exploratory mining of various kinds of rules including associations, introduced the notion of constrained frequent set queries (CFQs), and developed effective pruning optimizations for CFQs with 1-variable (1-var) constraints. While 1-var constraints are useful for constraining the antecedent and consequent separately, many natural examples of CFQs illustrate the need for constraining the antecedent and consequent jointly, for which 2-variable (2-var) constraints are indispensable. Developing pruning optimizations for CFQs with 2-var constraints is the subject of this paper. But this is a difficult problem because: (i) in 2-var constraints, both variables keep changing and, unlike 1-var constraints, there is no fixed target for pruning; (ii) as we show, "conventional" monotonicity-based optimization techniques do not apply effectively to 2-var constraints. The contributions are as follows. (1) We introduce a notion of quasi-succinctness, which allows a quasi-succinct 2-var constraint to be reduced to two succinct 1-var constraints for pruning. (2) We characterize the class of 2-var constraints that are quasi-succinct. (3) We develop heuristic techniques for non-quasi-succinct constraints. Experimental results show the effectiveness of all our techniques. (4) We propose a query optimizer for CFQs and show that for a large class of constraints, the computation strategy generated by the optimizer is ccc-optimal, i.e., minimizing the effort incurred w.r.t. constraint checking and support counting.


References

Note: References link to DBLP on the Web.

[1]
Rakesh Agrawal , Tomasz Imielinski , Arun N. Swami : Mining Association Rules between Sets of Items in Large Databases. SIGMOD Conference 1993 : 207-216
[2]
Rakesh Agrawal , Ramakrishnan Srikant : Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994 : 487-499
[3]
Roberto J. Bayardo Jr. : Efficiently Mining Long Patterns from Databases. SIGMOD Conference 1998 : 85-93
[4]
Sergey Brin , Rajeev Motwani , Craig Silverstein : Beyond Market Baskets: Generalizing Association Rules to Correlations. SIGMOD Conference 1997 : 265-276
[5]
Surajit Chaudhuri : Data Mining and Database Systems: Where is the Intersection? Data Engineering Bulletin 21(1) : 4-8(1998)
[6]
David Wai-Lok Cheung , Jiawei Han , Vincent Ng , C. Y. Wong : Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique. ICDE 1996 : 106-114
[7]
Takeshi Fukuda , Yasuhiko Morimoto , Shinichi Morishita , Takeshi Tokuyama : Data Mining Using Two-Dimensional Optimized Accociation Rules: Scheme, Algorithms, and Visualization. SIGMOD Conf. 1996 : 13-23
[8]
Jiawei Han , Yongjian Fu : Discovery of Multiple-Level Association Rules from Large Databases. VLDB 1995 : 420-431
[9]
Tomasz Imielinski , Heikki Mannila : A Database Perspective on Knowledge Discovery. CACM 39(11) : 58-64(1996)
[10]
Micheline Kamber , Jiawei Han , Jenny Chiang : Metarule-Guided Mining of Multi-Dimensional Association Rules Using Data Cubes. KDD 1997 : 207-210
[11]
Mika Klemettinen , Heikki Mannila , Pirjo Ronkainen , Hannu Toivonen , A. Inkeri Verkamo : Finding Interesting Rules from Large Sets of Discovered Association Rules. CIKM 1994 : 401-407
[12]
Flip Korn , Alexandros Labrinidis , Yannis Kotidis , Christos Faloutsos : Ratio Rules: A New Paradigm for Fast, Quantifiable Data Mining. VLDB 1998 : 582-593
[13]
...
[14]
Renée J. Miller , Yuping Yang : Association Rules over Interval Data. SIGMOD Conference 1997 : 452-461
[15]
Raymond T. Ng , Laks V. S. Lakshmanan , Jiawei Han , Alex Pang : Exploratory Mining and Pruning Optimizations of Constrained Association Rules. SIGMOD Conference 1998 : 13-24
[16]
Jong Soo Park , Ming-Syan Chen , Philip S. Yu : An Effective Hash Based Algorithm for Mining Association Rules. SIGMOD Conference 1995 : 175-186
[17]
Sridhar Ramaswamy , Sameer Mahajan , Abraham Silberschatz : On the Discovery of Interesting Patterns in Association Rules. VLDB 1998 : 368-379
[18]
Sunita Sarawagi , Shiby Thomas , Rakesh Agrawal : Integrating Mining with Relational Database Systems: Alternatives and Implications. SIGMOD Conference 1998 : 343-354
[19]
Abraham Silberschatz , Stanley B. Zdonik : Database Systems - Breaking Out of the Box. SIGMOD Record 26(3) : 36-50(1997)
[20]
Craig Silverstein , Sergey Brin , Rajeev Motwani , Jeffrey D. Ullman : Scalable Techniques for Mining Causal Structures. VLDB 1998 : 594-605
[21]
Ramakrishnan Srikant , Rakesh Agrawal : Mining Generalized Association Rules. VLDB 1995 : 407-419
[22]
Ramakrishnan Srikant , Rakesh Agrawal : Mining Quantitative Association Rules in Large Relational Tables. SIGMOD Conf. 1996 : 1-12
[23]
Ramakrishnan Srikant , Quoc Vu , Rakesh Agrawal : Mining Association Rules with Item Constraints. KDD 1997 : 67-73
[24]
Hannu Toivonen : Sampling Large Databases for Association Rules. VLDB 1996 : 134-145
[25]
Shalom Tsur , Jeffrey D. Ullman , Serge Abiteboul , Chris Clifton , Rajeev Motwani , Svetlozar Nestorov , Arnon Rosenthal : Query Flocks: A Generalization of Association-Rule Mining. SIGMOD Conference 1998 : 1-12

BIBTEX

@inproceedings{DBLP:conf/sigmod/LakshmananNHP99,
  author    = {Laks V. S. Lakshmanan and
                Raymond T. Ng and
                Jiawei Han and
                Alex Pang},
   editor    = {Alex Delis and
                Christos Faloutsos and
                Shahram Ghandeharizadeh},
   title     = {Optimization of Constrained Frequent Set Queries with 2-variable
                Constraints},
   booktitle = {SIGMOD 1999, Proceedings ACM SIGMOD International Conference
                on Management of Data, June 1-3, 1999, Philadephia, Pennsylvania,
                USA},
   publisher = {ACM Press},
   year      = {1999},
   isbn      = {1-58113-084-8},
   pages     = {157-168},
   crossref  = {DBLP:conf/sigmod/99},
   bibsource = {DBLP, http://dblp.uni-trier.de} } },


























Copyright(C) 2000 ACM