ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

Background for Association Rules and Cost Estimate of Selected Mining Algorithms.

Jia Liang Han, Ashley W. Plank: Background for Association Rules and Cost Estimate of Selected Mining Algorithms. CIKM 1996: 73-80
@inproceedings{DBLP:conf/cikm/HanP96,
  author    = {Jia Liang Han and
               Ashley W. Plank},
  title     = {Background for Association Rules and Cost Estimate of Selected
               Mining Algorithms},
  booktitle = {CIKM '96, Proceedings of the Fifth International Conference on
               Information and Knowledge Management, November 12 - 16, 1996,
               Rockville, Maryland, USA},
  publisher = {ACM},
  year      = {1996},
  pages     = {73-80},
  ee        = {db/conf/cikm/HanP96.html, http://doi.acm.org/10.1145/238355.238440},
  crossref  = {DBLP:conf/cikm/96},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

Association rules may be used to represent regular patterns in databases for the purpose of decision support applications. Fast algorithms for mining association rules have been proposed and studied experimentally in the literature. A key to the algorithms is to find large itemsets, i.e., sets of items that are well supported in the database. In this paper, we study association rules from a statistical viewpoint. Assuming statistical independence, we obtain the expected support of an itemset, which we regard as the background. Fluctuation of support of an itemset is noise unless it differs significantly from the background. We introduce the concept of clusters, which are the largest large itemsets. From clusters in a database, we may estimate the number of large itemsets and candidate itemsets (intermediate results), which are important to the space complexity. We consider computation costs of Apriori, AprioriTid, AprioriHybrid, OCD, SETM and DHP algorithms and study their scalability. Our study suggests that the key to costs and scalability is the space complexity of large itemsets and candidate itemsets. If the size of candidate k-itemsets is less than main memory, then the above algorithms scale up well. If the size of candidate k-itemsets is larger than main memory, however, the costs increase very rapidly.

Copyright © 1996 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 2 Issue 4, CIKM, DOLAP, GIS, SIGFIDET, ..." and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

CIKM '96, Proceedings of the Fifth International Conference on Information and Knowledge Management, November 12 - 16, 1996, Rockville, Maryland, USA. ACM 1996
Contents BibTeX

Online Edition

Citation Page BibTeX
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
CIKM 1996 Proceedings, ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:01:52 2009