ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

Beyond Market Baskets: Generalizing Association Rules to Correlations.

Sergey Brin, Rajeev Motwani, Craig Silverstein: Beyond Market Baskets: Generalizing Association Rules to Correlations. SIGMOD Conference 1997: 265-276
@inproceedings{DBLP:conf/sigmod/BrinMS97,
  author    = {Sergey Brin and
               Rajeev Motwani and
               Craig Silverstein},
  editor    = {Joan Peckham},
  title     = {Beyond Market Baskets: Generalizing Association Rules to Correlations},
  booktitle = {SIGMOD 1997, Proceedings ACM SIGMOD International Conference
               on Management of Data, May 13-15, 1997, Tucson, Arizona, USA},
  publisher = {ACM Press},
  year      = {1997},
  pages     = {265-276},
  ee        = {http://doi.acm.org/10.1145/253260.253327, db/conf/sigmod/BrinMS97.html},
  crossref  = {DBLP:conf/sigmod/97},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

One of the most well-studied problems in data mining is mining for association rules in market basket data. Association rules, whose significance is measured via support and confidence, are intended to identify rules of the type, "A customer purchasing item A often also purchases item B." Motivated by the goal of generalizing beyond market baskets and the association rules used with them, we develop the notion of mining rules that identify correlations (generalizing associations), and we consider both the absence and presence of items as a basis for generating rules. We propose measuring significance of associations via the chi-squared test for correlation from classical statistics. This leads to a measure that is upward closed in the itemset lattice, enabling us to reduce the mining problem to the search for a border between correlated and uncorrelated itemsets in the lattice. We develop pruning strategies and devise an efficient algorithm for the resulting problem. We demonstrate its effectiveness by testing it on census data and finding term dependence in a corpus of text documents, as well as on synthetic data.

Copyright © 1997 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

Online Version (ACM WWW Account required): Full Text in PDF Format

CDROM Version: Load the CDROM "Volume 1 Issue 1, SIGMOD '93-'97" and ...

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Joan Peckham (Ed.): SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, May 13-15, 1997, Tucson, Arizona, USA. ACM Press 1997 BibTeX , SIGMOD Record 26(2), June 1997
Contents

Online Edition: ACM Digital Library

[Index Terms]
[Full Text in PDF Format, 1550 KB]

References

[1]
Rakesh Agrawal, Manish Mehta, John C. Shafer, Ramakrishnan Srikant, Andreas Arning, Toni Bollinger: The Quest Data Mining System. KDD 1996: 244-249 BibTeX
[2]
Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Mining Association Rules between Sets of Items in Large Databases. SIGMOD Conference 1993: 207-216 BibTeX
[3]
Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Database Mining: A Performance Perspective. IEEE Trans. Knowl. Data Eng. 5(6): 914-925(1993) BibTeX
[4]
...
[5]
Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499 BibTeX
[6]
...
[7]
Martin Dietzfelbinger, Anna R. Karlin, Kurt Mehlhorn, Friedhelm Meyer auf der Heide, Hans Rohnert, Robert Endre Tarjan: Dynamic Perfect Hashing: Upper and Lower Bounds. FOCS 1988: 524-531 BibTeX
[8]
...
[9]
Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, Ramasamy Uthurusamy (Eds.): Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press 1996, ISBN 0-262-56097-6
Contents BibTeX
[10]
Michael L. Fredman, János Komlós, Endre Szemerédi: Storing a Sparse Table with 0(1) Worst Case Access Time. J. ACM 31(3): 538-544(1984) BibTeX
[11]
Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: Mining Optimized Association Rules for Numeric Attributes. PODS 1996: 182-191 BibTeX
[12]
Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: Data Mining Using Two-Dimensional Optimized Accociation Rules: Scheme, Algorithms, and Visualization. SIGMOD Conference 1996: 13-23 BibTeX
[13]
Jim Gray, Adam Bosworth, Andrew Layman, Hamid Pirahesh: Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total. ICDE 1996: 152-159 BibTeX
[14]
Dimitrios Gunopulos, Heikki Mannila, Sanjeev Saluja: Discovering All Most Specific Sentences by Randomized Algorithms. ICDT 1997: 215-229 BibTeX
[15]
Jiawei Han, Yongjian Fu: Discovery of Multiple-Level Association Rules from Large Databases. VLDB 1995: 420-431 BibTeX
[16]
Maurice A. W. Houtsma, Arun N. Swami: Set-Oriented Mining for Association Rules in Relational Databases. ICDE 1995: 25-33 BibTeX
[17]
Mika Klemettinen, Heikki Mannila, Pirjo Ronkainen, Hannu Toivonen, A. Inkeri Verkamo: Finding Interesting Rules from Large Sets of Discovered Association Rules. CIKM 1994: 401-407 BibTeX
[18]
...
[19]
...
[20]
...
[21]
...
[22]
...
[23]
...
[24]
Jong Soo Park, Ming-Syan Chen, Philip S. Yu: An Effective Hash Based Algorithm for Mining Association Rules. SIGMOD Conference 1995: 175-186 BibTeX
[25]
...
[26]
Gregory Piatetsky-Shapiro, William J. Frawley (Eds.): Knowledge Discovery in Databases. AAAI/MIT Press 1991, ISBN 0-262-62080-4
Contents BibTeX
[27]
Ashok Savasere, Edward Omiecinski, Shamkant B. Navathe: An Efficient Algorithm for Mining Association Rules in Large Databases. VLDB 1995: 432-444 BibTeX
[28]
Ramakrishnan Srikant, Rakesh Agrawal: Mining Generalized Association Rules. VLDB 1995: 407-419 BibTeX
[29]
Hannu Toivonen: Sampling Large Databases for Association Rules. VLDB 1996: 134-145 BibTeX
[30]
Jennifer Widom: Research Problems in Data Warehousing. CIKM 1995: 25-30 BibTeX

Referenced by

  1. Flip Korn, Alexandros Labrinidis, Yannis Kotidis, Christos Faloutsos: Quantifiable Data Mining Using Ratio Rules. VLDB J. 8(3-4): 254-266(2000)
  2. David Gibson, Jon M. Kleinberg, Prabhakar Raghavan: Clustering Categorical Data: An Approach Based on Dynamical Systems. VLDB J. 8(3-4): 222-236(2000)
  3. Ke Wang, Yu He, Jiawei Han: Mining Frequent Itemsets Using Support Constraints. VLDB 2000: 43-52
  4. Theodore Johnson, Laks V. S. Lakshmanan, Raymond T. Ng: The 3W Model and Algebra for Unified Data Mining. VLDB 2000: 21-32
  5. Jiawei Han, Jian Pei, Yiwen Yin: Mining Frequent Patterns without Candidate Generation. SIGMOD Conference 2000: 1-12
  6. Shinichi Morishita, Jun Sese: Traversing Itemset Lattice with Statistical Metric Pruning. PODS 2000: 226-236
  7. Edwin M. Knorr, Raymond T. Ng: Finding Intensional Knowledge of Distance-Based Outliers. VLDB 1999: 211-222
  8. Raymond T. Ng, Laks V. S. Lakshmanan, Jiawei Han, Teresa Mah: Exploratory Mining via Constrained Frequent Set Queries. SIGMOD Conference 1999: 556-558
  9. Laks V. S. Lakshmanan, Raymond T. Ng, Jiawei Han, Alex Pang: Optimization of Constrained Frequent Set Queries with 2-variable Constraints. SIGMOD Conference 1999: 157-168
  10. Jean-François Boulicaut, Patrick Marcel, Christophe Rigotti: Query Driven Knowledge Discovery in Multidimensional Data. DOLAP 1999: 87-93
  11. Philip S. Yu: Data Mining and Personalization Technologies. DASFAA 1999: 6-13
  12. Eui-Hong Han, George Karypis, Vipin Kumar, Bamshad Mobasher: Hypergraph Based Clustering in High-Dimensional Data Sets: A Summary of Results. IEEE Data Eng. Bull. 21(1): 15-22(1998)
  13. Charu C. Aggarwal, Philip S. Yu: Mining Large Itemsets for Association Rules. IEEE Data Eng. Bull. 21(1): 23-31(1998)
  14. Craig Silverstein, Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman: Scalable Techniques for Mining Causal Structures. VLDB 1998: 594-605
  15. Flip Korn, Alexandros Labrinidis, Yannis Kotidis, Christos Faloutsos: Ratio Rules: A New Paradigm for Fast, Quantifiable Data Mining. VLDB 1998: 582-593
  16. Soumen Chakrabarti, Sunita Sarawagi, Byron Dom: Mining Surprising Patterns Using Temporal Description Length. VLDB 1998: 606-617
  17. Raymond T. Ng, Laks V. S. Lakshmanan, Jiawei Han, Alex Pang: Exploratory Mining and Pruning Optimizations of Constrained Association Rules. SIGMOD Conference 1998: 13-24
  18. Charu C. Aggarwal, Philip S. Yu: A New Framework For Itemset Generation. PODS 1998: 18-24
  19. Cecil Chua Eng Huang, Roger H. L. Chiang, Ee-Peng Lim: A Heuristic Method for Correlating Attribute Group Pairs in Data Mining. ER Workshops 1998: 29-40
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:40:37 2009