ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

Wavelet-Based Histograms for Selectivity Estimation.

Yossi Matias, Jeffrey Scott Vitter, Min Wang: Wavelet-Based Histograms for Selectivity Estimation. SIGMOD Conference 1998: 448-459
@inproceedings{DBLP:conf/sigmod/MatiasVW98,
  author    = {Yossi Matias and
               Jeffrey Scott Vitter and
               Min Wang},
  editor    = {Laura M. Haas and
               Ashutosh Tiwary},
  title     = {Wavelet-Based Histograms for Selectivity Estimation},
  booktitle = {SIGMOD 1998, Proceedings ACM SIGMOD International Conference
               on Management of Data, June 2-4, 1998, Seattle, Washington, USA},
  publisher = {ACM Press},
  year      = {1998},
  isbn      = {0-89791-995-5},
  pages     = {448-459},
  ee        = {http://doi.acm.org/10.1145/276304.276344, db/conf/sigmod/MatiasVW98.html},
  crossref  = {DBLP:conf/sigmod/98},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

Query optimization is an integral part of relational database management systems. One important task in query optimization is selectivity estimation, that is, given a query P, we need to estimate the fraction of records in the database that satisfy P. Many commercial database systems maintain histograms to approximate the frequency distribution of values in the attributes of relations.

In this paper, we present a technique based upon a multiresolution wavelet decomposition for building histograms on the underlying data distributions, with applications to databases, statistics, and simulation. Histograms built on the cumulative data distributions give very good approximations with limited space usage. We give fast algorithms for constructing histograms and using them in an on-line fashion for selectivity estimation. Our histograms also provide quick approximate answers to OLAP queries when the exact answers are not required. Our method captures the joint distribution of multiple attributes effectively, even when the attributes are correlated. Experiments confirm that our histograms offer substantial improvements in accuracy over random sampling and other previous approaches.

Copyright © 1998 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD DiSC

CDROM Version: Load the CDROM "DiSC, Volume 1 Number 1" and ... Online Version (ACM WWW Account required): Full Text in PDF Format

ACM SIGMOD Anthology

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Laura M. Haas, Ashutosh Tiwary (Eds.): SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2-4, 1998, Seattle, Washington, USA. ACM Press 1998, ISBN 0-89791-995-5 BibTeX , SIGMOD Record 27(2), June 1998
Contents

Online Edition: ACM SIGMOD

[Abstract]
[Full Text (Postscript)]

References

[1]
...
[2]
...
[3]
Phillip B. Gibbons, Yossi Matias: New Sampling-Based Summary Statistics for Improving Approximate Query Answers. SIGMOD Conference 1998: 331-342 BibTeX
[4]
Phillip B. Gibbons, Yossi Matias, Viswanath Poosala: Fast Incremental Maintenance of Approximate Histograms. VLDB 1997: 466-475 BibTeX
[5]
Peter J. Haas, Arun N. Swami: Sequential Sampling Procedures for Query Size Estimation. SIGMOD Conference 1992: 341-350 BibTeX
[6]
Peter J. Haas, Arun N. Swami: Sampling-Based Selectivity Estimation for Joins Using Augmented Frequent Value Statistics. ICDE 1995: 522-531 BibTeX
[7]
Ching-Tien Ho, Rakesh Agrawal, Nimrod Megiddo, Ramakrishnan Srikant: Range Queries in OLAP Data Cubes. SIGMOD Conference 1997: 73-88 BibTeX
[8]
...
[9]
Richard J. Lipton, Jeffrey F. Naughton: Query Size Estimation by Adaptive Sampling. J. Comput. Syst. Sci. 51(1): 18-25(1995) BibTeX
[10]
Richard J. Lipton, Jeffrey F. Naughton, Donovan A. Schneider: Practical Selectivity Estimation through Adaptive Sampling. SIGMOD Conference 1990: 1-11 BibTeX
[11]
M. Muralikrishna, David J. DeWitt: Equi-Depth Histograms For Estimating Selectivity Factors For Multi-Dimensional Queries. SIGMOD Conference 1988: 28-36 BibTeX
[12]
Gregory Piatetsky-Shapiro, Charles Connell: Accurate Estimation of the Number of Tuples Satisfying a Condition. SIGMOD Conference 1984: 256-276 BibTeX
[13]
Viswanath Poosala: Histogram-Based Estimation Techniques in Database Systems. Ph.D. thesis, Univ. of Wisconsin-Madison 1997
BibTeX
[14]
...
[15]
Viswanath Poosala, Yannis E. Ioannidis: Selectivity Estimation Without the Attribute Value Independence Assumption. VLDB 1997: 486-495 BibTeX
[16]
Viswanath Poosala, Yannis E. Ioannidis, Peter J. Haas, Eugene J. Shekita: Improved Histograms for Selectivity Estimation of Range Predicates. SIGMOD Conference 1996: 294-305 BibTeX
[17]
...
[18]
Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, Thomas G. Price: Access Path Selection in a Relational Database Management System. SIGMOD Conference 1979: 23-34 BibTeX
[19]
...
[20]
...
[21]
...
[22]
...
[23]
Jeffrey Scott Vitter: Random Sampling with a Reservoir. ACM Trans. Math. Softw. 11(1): 37-57(1985) BibTeX
[24]
...
[25]
Jeffrey Scott Vitter: External Memory Algorithms. PODS 1998: 119-128 BibTeX
[26]
...

Referenced by

  1. Themistoklis Palpanas: Knowledge Discovery in Data Warehouses. SIGMOD Record 29(3): 88-100(2000)
  2. Yossi Matias, Jeffrey Scott Vitter, Min Wang: Dynamic Maintenance of Wavelet-Based Histograms. VLDB 2000: 101-110
  3. Kaushik Chakrabarti, Minos N. Garofalakis, Rajeev Rastogi, Kyuseok Shim: Approximate Query Processing Using Wavelets. VLDB 2000: 111-122
  4. Dimitrios Gunopulos, George Kollios, Vassilis J. Tsotras, Carlotta Domeniconi: Approximating Multi-Dimensional Aggregate Range Queries over Real Attributes. SIGMOD Conference 2000: 463-474
  5. Nick Koudas, S. Muthukrishnan, Divesh Srivastava: Optimal Histograms for Hierarchical Range Queries. PODS 2000: 196-204
  6. Arnd Christian König, Gerhard Weikum: Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation. VLDB 1999: 423-434
  7. Aristides Gionis, Piotr Indyk, Rajeev Motwani: Similarity Search in High Dimensions via Hashing. VLDB 1999: 518-529
  8. Viswanath Poosala, Venkatesh Ganti: Fast Approximate Answers to Aggregate Queries on a Data Cube. SSDBM 1999: 24-33
  9. Jeffrey Scott Vitter, Min Wang: Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets. SIGMOD Conference 1999: 193-204
  10. Björn Blohsfeld, Dieter Korus, Bernhard Seeger: A Comparison of Selectivity Estimators for Range Queries on Metric Attributes. SIGMOD Conference 1999: 239-250
  11. Ashraf Aboulnaga, Surajit Chaudhuri: Self-tuning Histograms: Building Histograms Without Looking at Data. SIGMOD Conference 1999: 181-192
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:40:45 2009