ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Computing Iceberg Queries Efficiently.

Min Fang, Narayanan Shivakumar, Hector Garcia-Molina, Rajeev Motwani, Jeffrey D. Ullman: Computing Iceberg Queries Efficiently. VLDB 1998: 299-310
@inproceedings{DBLP:conf/vldb/FangSGMU98,
  author    = {Min Fang and
               Narayanan Shivakumar and
               Hector Garcia-Molina and
               Rajeev Motwani and
               Jeffrey D. Ullman},
  editor    = {Ashish Gupta and
               Oded Shmueli and
               Jennifer Widom},
  title     = {Computing Iceberg Queries Efficiently},
  booktitle = {VLDB'98, Proceedings of 24rd International Conference on Very
               Large Data Bases, August 24-27, 1998, New York City, New York,
               USA},
  publisher = {Morgan Kaufmann},
  year      = {1998},
  isbn      = {1-55860-566-5},
  pages     = {299-310},
  ee        = {db/conf/vldb/FangSGMU98.html},
  crossref  = {DBLP:conf/vldb/98},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

Many applications compute aggregate functions over an attribute (or set ofattributes) to find aggregate values above some specified threshold. We call such queries iceberg queries, because the number of above- threshold results is often very small (the tip of an iceberg), relative tothe large amount of input data (the iceberg). Such iceberg queries are common in many applications, including data warehousing, information-retrieval, market basket analysis in data mining, clustering and copy detection. We propose efficient algorithms to evaluate iceberg queries using very little memory and significantly fewer passes over data, when compared to current techniques that use sorting or hashing. We present an experimental case study using over three gigabytes of Web data to illustrate the savings obtained by our algorithms.

Copyright © 1998 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD DiSC

CDROM Version: Load the CDROM "DiSC, Volume 1 Number 1" and ...

ACM SIGMOD Anthology

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Ashish Gupta, Oded Shmueli, Jennifer Widom (Eds.): VLDB'98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24-27, 1998, New York City, New York, USA. Morgan Kaufmann 1998, ISBN 1-55860-566-5
Contents BibTeX

References

[AS94]
Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499 BibTeX
[BD83]
Dina Bitton, David J. DeWitt: Duplicate Record Elimination in Large Data Files. ACM Trans. Database Syst. 8(2): 255-265(1983) BibTeX
[BGM97]
...
[BMUT97]
Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, Shalom Tsur: Dynamic Itemset Counting and Implication Rules for Market Basket Data. SIGMOD Conference 1997: 255-264 BibTeX
[BP]
...
[Bro97]
...
[FM85]
Philippe Flajolet, G. Nigel Martin: Probabilistic Counting Algorithms for Data Base Applications. J. Comput. Syst. Sci. 31(2): 182-209(1985) BibTeX
[FSGM+97]
...
[HNSS96]
Peter J. Haas, Jeffrey F. Naughton, S. Seshadri, Arun N. Swami: Selectivity and Cost Estimation for Joins Based on Random Sampling. J. Comput. Syst. Sci. 52(3): 550-569(1996) BibTeX
[Olk93]
...
[PCY95]
Jong Soo Park, Ming-Syan Chen, Philip S. Yu: An Effective Hash Based Algorithm for Mining Association Rules. SIGMOD Conference 1995: 175-186 BibTeX
[SB88]
Gerard Salton, Chris Buckley: Term-Weighting Approaches in Automatic Text Retrieval. Inf. Process. Manage. 24(5): 513-523(1988) BibTeX
[SGM96]
Narayanan Shivakumar, Hector Garcia-Molina: Building a Scalable and Accurate Copy Detection Mechanism. Digital Libraries 1996: 160-168 BibTeX
[SGM98]
...
[TPC]
...
[Ull88]
Jeffrey D. Ullman: Principles of Database and Knowledge-Base Systems, Volume I. Computer Science Press 1988, ISBN 0-7167-8158-1
Contents BibTeX
[WVZT90]
Kyu-Young Whang, Brad T. Vander Zanden, Howard M. Taylor: A Linear-Time Probabilistic Counting Algorithm for Database Applications. ACM Trans. Database Syst. 15(2): 208-229(1990) BibTeX
[Zip49]
George Kingsley Zipf: Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology. Addison-Wesley 1949
BibTeX

Referenced by

  1. Junghoo Cho, Narayanan Shivakumar, Hector Garcia-Molina: Finding Replicated Web Collections. SIGMOD Conference 2000: 355-366
  2. Stéphane Grumbach, Leonardo Tininini: On the Content of Materialized Aggregate Views. PODS 2000: 47-57
  3. Donko Donjerkovic, Raghu Ramakrishnan: Probabilistic Optimization of Top N Queries. VLDB 1999: 411-422
  4. Kevin S. Beyer, Raghu Ramakrishnan: Bottom-Up Computation of Sparse and Iceberg CUBEs. SIGMOD Conference 1999: 359-370
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:46:21 2009