ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

Mining Optimized Association Rules for Numeric Attributes.

Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: Mining Optimized Association Rules for Numeric Attributes. PODS 1996: 182-191
@inproceedings{DBLP:conf/pods/FukudaMMT96,
  author    = {Takeshi Fukuda and
               Yasuhiko Morimoto and
               Shinichi Morishita and
               Takeshi Tokuyama},
  title     = {Mining Optimized Association Rules for Numeric Attributes},
  booktitle = {Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGART Symposium
               on Principles of Database Systems, June 3-5, 1996, Montreal,
               Canada},
  publisher = {ACM Press},
  year      = {1996},
  isbn      = {0-89791-781-2},
  pages     = {182-191},
  ee        = {http://doi.acm.org/10.1145/237661.237708, db/conf/pods/FukudaMMT96.html},
  crossref  = {DBLP:conf/pods/96},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

Given a huge database, we address the problem of finding association rules for numeric attributes, such as

(Balance in I) => (CardLoan = yes),

which implies that bank customers whose balances fall in a range I are likely to use card loan with a probability greater than p. The above rule is interesting only if the range I has some special feature with respect to the interrelation between Balance and CardLoan. It is required that the number of customers whose balances are contained in I (called the support of I) is sufficient, and also that the probability p (called the confidence ratio) should be much higher than the average probability of the condition being met.

Our goal is to realize a system that finds such appropriate ranges automatically. We mainly focus on computing two optimized ranges: one that maximizes the support on the condition that the confidence ratio is at least a given threshold value, and another that maximizes the confidence ratio on the condition that the support is at leat a given threshold number.

Using techniques from computational geometry, we present novel algorithms that compute the optimized ranges in linear time if the data are sorted. Since sorting data with respect to each numeric attribute is expensive in the case of huge databases that occupy much more space than the main memory, we instead apply randomized bucketting as the preprocessing method, an thus obtain an efficient rule-finding system.

Tests show that our implementation is fast not only in theory but also in practice. The efficiency of our algorithm enables us to compute rules for all combinations of hundreds of numeric and Boolean attributes in a reasonable time.

Copyright © 1996 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


Load The ACM SIGMOD Anthology, CDROM Edition, Volume 1-3, PODS '82-'98. and ... Load The ACM SIGMOD Anthology, Silver Edition, DVD 1, Proceedings. and ... BibTeX

Printed Edition

Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 3-5, 1996, Montreal, Canada. ACM Press 1996, ISBN 0-89791-781-2
Contents BibTeX

Online Edition: ACM Digital Library

[Index Terms]
[Full Text in PDF Format, 853 KB]

References

[AGI+92]
Rakesh Agrawal, Sakti P. Ghosh, Tomasz Imielinski, Balakrishna R. Iyer, Arun N. Swami: An Interval Classifier for Database Mining Applications. VLDB 1992: 560-573 BibTeX
[AIS93a]
Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Database Mining: A Performance Perspective. IEEE Trans. Knowl. Data Eng. 5(6): 914-925(1993) BibTeX
[AIS93b]
Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Mining Association Rules between Sets of Items in Large Databases. SIGMOD Conference 1993: 207-216 BibTeX
[AS94]
Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499 BibTeX
[Ben84]
Jon Louis Bentley: Algorithm Design Techniques. Commun. ACM 27(9): 865-871(1984) BibTeX
[BFOS84]
Leo Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone: Classification and Regression Trees. Wadsworth 1984, ISBN 0-534-98053-8
BibTeX
[FMMT96]
Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: Data Mining Using Two-Dimensional Optimized Accociation Rules: Scheme, Algorithms, and Visualization. SIGMOD Conference 1996: 13-23 BibTeX
[HCC92]
Jiawei Han, Yandong Cai, Nick Cercone: Knowledge Discovery in Databases: An Attribute-Oriented Approach. VLDB 1992: 547-559 BibTeX
[MAR96]
Manish Mehta, Rakesh Agrawal, Jorma Rissanen: SLIQ: A Fast Scalable Classifier for Data Mining. EDBT 1996: 18-32 BibTeX
[MR95]
Rajeev Motwani, Prabhakar Raghavan: Randomized Algorithms. Cambridge University Press 1995, ISBN 0-521-47465-5
BibTeX
[NH94]
Raymond T. Ng, Jiawei Han: Efficient and Effective Clustering Methods for Spatial Data Mining. VLDB 1994: 144-155 BibTeX
[PCY95]
Jong Soo Park, Ming-Syan Chen, Philip S. Yu: An Effective Hash Based Algorithm for Mining Association Rules. SIGMOD Conference 1995: 175-186 BibTeX
[PS91]
Gregory Piatetsky-Shapiro: Discovery, Analysis, and Presentation of Strong Rules. Knowledge Discovery in Databases 1991: 229-248 BibTeX
[PSF91]
Gregory Piatetsky-Shapiro, William J. Frawley (Eds.): Knowledge Discovery in Databases. AAAI/MIT Press 1991, ISBN 0-262-62080-4
Contents BibTeX
[Qui93]
J. Ross Quinlan: C4.5: Programs for Machine Learning. Morgan Kaufmann 1993, ISBN 1-55860-238-0
BibTeX
[SA96]
Ramakrishnan Srikant, Rakesh Agrawal: Mining Quantitative Association Rules in Large Relational Tables. SIGMOD Conference 1996: 1-12 BibTeX
[SAD+96]
Michael Stonebraker, Rakesh Agrawal, Umeshwar Dayal, Erich J. Neuhold, Andreas Reuter: DBMS Research at a Crossroads: The Vienna Update. VLDB 1993: 688-692 BibTeX

Referenced by

  1. Shinichi Morishita, Jun Sese: Traversing Itemset Lattice with Statistical Metric Pruning. PODS 2000: 226-236
  2. Johannes Gehrke, Venkatesh Ganti, Raghu Ramakrishnan, Wei-Yin Loh: BOAT-Optimistic Decision Tree Construction. SIGMOD Conference 1999: 169-180
  3. Rajeev Rastogi, Kyuseok Shim: Mining Optimized Support Rules for Numeric Attributes. ICDE 1999: 206-215
  4. Yasuhiko Morimoto, Takeshi Fukuda, Hirofumi Matsuzawa, Takeshi Tokuyama, Kunikazu Yoda: Algorithms for Mining Association Rules for Binary Segmentations of Huge Categorical Databases. VLDB 1998: 380-391
  5. Rajeev Rastogi, Kyuseok Shim: Mining Optimized Association Rules with Categorical and Numeric Attributes. ICDE 1998: 503-512
  6. Takeshi Fukuda, Hirofumi Matsuzawa: Parallel Processing of Multiple Aggregate Queries on Shared-Nothing Multiprocessors. EDBT 1998: 278-292
  7. Yasuhiko Morimoto, Hiromu Ishii, Shinichi Morishita: Efficient Construction of Regression Trees with Range and Region Splitting. VLDB 1997: 166-175
  8. Sergey Brin, Rajeev Motwani, Craig Silverstein: Beyond Market Baskets: Generalizing Association Rules to Correlations. SIGMOD Conference 1997: 265-276
  9. Heikki Mannila: Methods and Problems in Data Mining. ICDT 1997: 41-55
  10. Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: Constructing Efficient Decision Trees by Using Optimized Numeric Association Rules. VLDB 1996: 146-155
  11. Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: SONAR: System for Optimized Numeric AssociationRules. SIGMOD Conference 1996: 553
  12. Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: Data Mining Using Two-Dimensional Optimized Accociation Rules: Scheme, Algorithms, and Visualization. SIGMOD Conference 1996: 13-23
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:34:15 2009