ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

Integrating Mining with Relational Database Systems: Alternatives and Implications.

Sunita Sarawagi, Shiby Thomas, Rakesh Agrawal: Integrating Mining with Relational Database Systems: Alternatives and Implications. SIGMOD Conference 1998: 343-354
@inproceedings{DBLP:conf/sigmod/SarawagiTA98,
  author    = {Sunita Sarawagi and
               Shiby Thomas and
               Rakesh Agrawal},
  editor    = {Laura M. Haas and
               Ashutosh Tiwary},
  title     = {Integrating Mining with Relational Database Systems: Alternatives
               and Implications},
  booktitle = {SIGMOD 1998, Proceedings ACM SIGMOD International Conference
               on Management of Data, June 2-4, 1998, Seattle, Washington, USA},
  publisher = {ACM Press},
  year      = {1998},
  isbn      = {0-89791-995-5},
  pages     = {343-354},
  ee        = {http://doi.acm.org/10.1145/276304.276335, db/conf/sigmod/SarawagiTA98.html},
  crossref  = {DBLP:conf/sigmod/98},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

Data mining on large data warehouses is becoming increasingly important. In support of this trend, we consider a spectrum of architectural alternatives for coupling mining with database systems. These alternatives include: loose-coupling through a SQL cursor interface; encapsulation of a mining algorithm in a stored procedure; caching the data to a file system on-the-fly and mining; tight-coupling using primarily user-defined functions; and SQL implementations for processing in the DBMS. We comprehensively study the option of expressing the mining algorithm in the form of SQL queries using Association rule mining as a case in point. We consider four options in SQL-92 and six options in SQL enhanced with object-relational extensions (SQL-OR). Our evaluation of the different architectural alternatives shows that from a performance perspective, the Cache-Mine option is superior, although the performance of the SQL-OR option is within a factor of two. Both the Cache-Mine and the SQL-OR approaches incur a higher storage penalty than the loose-coupling approach which performance-wise is a factor of 3 to 4 worse than Cache-Mine. The SQL-92 implementations were too slow to qualify as a competitive option. We also compare these alternatives on the basis of qualitative factors like automatic parallelization, development ease, portability and inter-operability.

Copyright © 1998 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD DiSC

CDROM Version: Load the CDROM "DiSC, Volume 1 Number 1" and ... Online Version (ACM WWW Account required): Full Text in PDF Format

ACM SIGMOD Anthology

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Laura M. Haas, Ashutosh Tiwary (Eds.): SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2-4, 1998, Seattle, Washington, USA. ACM Press 1998, ISBN 0-89791-995-5 BibTeX , SIGMOD Record 27(2), June 1998
Contents

Online Edition: ACM SIGMOD

[Abstract]

References

[1]
Rakesh Agrawal, Manish Mehta, John C. Shafer, Ramakrishnan Srikant, Andreas Arning, Toni Bollinger: The Quest Data Mining System. KDD 1996: 244-249 BibTeX
[2]
Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Mining Association Rules between Sets of Items in Large Databases. SIGMOD Conference 1993: 207-216 BibTeX
[3]
...
[4]
Rakesh Agrawal, John C. Shafer: Parallel Mining of Association Rules. IEEE Trans. Knowl. Data Eng. 8(6): 962-969(1996) BibTeX
[5]
Rakesh Agrawal, Kyuseok Shim: Developing Tightly-Coupled Data Mining Applications on a Relational Database System. KDD 1996: 287-290 BibTeX
[6]
Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, Shalom Tsur: Dynamic Itemset Counting and Implication Rules for Market Basket Data. SIGMOD Conference 1997: 255-264 BibTeX
[7]
Donald D. Chamberlin: Using the New DB2: IBM's Object-Relational Database System. Morgan Kaufmann 1996, ISBN 1-55860-373-5
BibTeX
[8]
Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, Ramasamy Uthurusamy (Eds.): Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press 1996, ISBN 0-262-56097-6
Contents BibTeX
[9]
...
[10]
Maurice A. W. Houtsma, Arun N. Swami: Set-Oriented Mining for Association Rules in Relational Databases. ICDE 1995: 25-33 BibTeX
[11]
...
[12]
Tomasz Imielinski, Heikki Mannila: A Database Perspective on Knowledge Discovery. Commun. ACM 39(11): 58-64(1996) BibTeX
[13]
...
[14]
...
[15]
Krishna G. Kulkarni: Object-Oriented Extensions in SQL3: A Status Report. SIGMOD Conference 1994: 478 BibTeX
[16]
Jim Melton, Alan R. Simon: Understanding the New SQL: A Complete Guide. Morgan Kaufmann 1993, ISBN 1-55860-245-3
Contents BibTeX
[17]
Rosa Meo, Giuseppe Psaila, Stefano Ceri: A New SQL-like Operator for Mining Association Rules. VLDB 1996: 122-133 BibTeX
[18]
...
[19]
Berthold Reinwald, Hamid Pirahesh: SQL Open Heterogeneous Data Access. SIGMOD Conference 1998: 506-507 BibTeX
[20]
...
[21]
...
[22]
Ramakrishnan Srikant, Rakesh Agrawal: Mining Generalized Association Rules. VLDB 1995: 407-419 BibTeX
[23]
Ramakrishnan Srikant, Rakesh Agrawal: Mining Sequential Patterns: Generalizations and Performance Improvements. EDBT 1996: 3-17 BibTeX
[24]
Hannu Toivonen: Sampling Large Databases for Association Rules. VLDB 1996: 134-145 BibTeX
[25]
Shalom Tsur, Jeffrey D. Ullman, Serge Abiteboul, Chris Clifton, Rajeev Motwani, Svetlozar Nestorov, Arnon Rosenthal: Query Flocks: A Generalization of Association-Rule Mining. SIGMOD Conference 1998: 1-12 BibTeX
[26]
...

Referenced by

  1. Surajit Chaudhuri: Review - Integrating Mining with Relational Database Systems: Alternatives and Implications. ACM SIGMOD Digital Review 2: (2000)
  2. Haixun Wang, Carlo Zaniolo: Using SQL to Build New Aggregates and Extenders for Object- Relational Systems. VLDB 2000: 166-175
  3. Theodore Johnson, Laks V. S. Lakshmanan, Raymond T. Ng: The 3W Model and Algebra for Unified Data Mining. VLDB 2000: 21-32
  4. Jiawei Han, Jian Pei, Yiwen Yin: Mining Frequent Patterns without Candidate Generation. SIGMOD Conference 2000: 1-12
  5. Damianos Chatziantoniou: Evaluation of Ad Hoc OLAP: In-Place Computation. SSDBM 1999: 34-43
  6. Raymond T. Ng, Laks V. S. Lakshmanan, Jiawei Han, Teresa Mah: Exploratory Mining via Constrained Frequent Set Queries. SIGMOD Conference 1999: 556-558
  7. Laks V. S. Lakshmanan, Raymond T. Ng, Jiawei Han, Alex Pang: Optimization of Constrained Frequent Set Queries with 2-variable Constraints. SIGMOD Conference 1999: 157-168
  8. Marek Wojciechowski: Mining Various Patterns in Sequential Data in an SQL-like Manner. ADBIS (Short Papers) 1999: 131-138
  9. Surajit Chaudhuri: Data Mining and Database Systems: Where is the Intersection? IEEE Data Eng. Bull. 21(1): 4-8(1998)
  10. Tomasz Imielinski, Aashu Virmani: Association Rules... and What's Next? Towards Second Generation Data Mining Systems. ADBIS 1998: 6-25
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:40:44 2009