Random Sampling for Histogram Construction: How much is enough?
Surajit Chaudhuri, Rajeev Motwani, Vivek R. Narasayya:
Random Sampling for Histogram Construction: How much is enough?
SIGMOD Conference 1998: 436-447@inproceedings{DBLP:conf/sigmod/ChaudhuriMN98,
author = {Surajit Chaudhuri and
Rajeev Motwani and
Vivek R. Narasayya},
editor = {Laura M. Haas and
Ashutosh Tiwary},
title = {Random Sampling for Histogram Construction: How much is enough?},
booktitle = {SIGMOD 1998, Proceedings ACM SIGMOD International Conference
on Management of Data, June 2-4, 1998, Seattle, Washington, USA},
publisher = {ACM Press},
year = {1998},
isbn = {0-89791-995-5},
pages = {436-447},
ee = {http://doi.acm.org/10.1145/276304.276343, db/conf/sigmod/ChaudhuriMN98.html},
crossref = {DBLP:conf/sigmod/98},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX
Abstract
Random sampling is a standard technique for constructing (approximate)
histograms for query optimization. However, any real implementation in
commercial products requires solving the hard problem of determining
"How much sampling is enough?" We address this critical question
in the context of equi-height histograms used in many commercial products,
including Microsoft SQL Server. We introduce a conservative error
metric capturing the intuition that for an approximate histogram to have
low error, the error must be small in all regions of the histogram.
We then present a result establishing an optimal bound on the amount of
sampling required for pre-specified error bounds.
We also describe an adaptive page sampling algorithm
which achieves greater efficiency by using all values in a sampled page
but adjusts the amount of sampling depending
on clustering of values in pages. Next, we establish that the problem of
estimating the number of distinct values is provably difficult ,
but propose a new error metric which has a reliable estimator and can
still be exploited by query optimizers to influence the choice of
execution plans.
The algorithm for histogram construction was prototyped on Microsoft SQL
Server 7.0 and we present experimental results showing
that the adaptive algorithm accurately approximates the true histogram
over different data distributions.
Copyright © 1998 by the ACM,
Inc., used by permission. Permission to make
digital or hard copies is granted provided that
copies are not made or distributed for profit or
direct commercial advantage, and that copies show
this notice on the first page or initial screen of
a display along with the full citation.
CDROM Version: Load the CDROM "DiSC, Volume 1 Number 1" and ...
Online Version (ACM WWW Account required): Full Text in PDF Format
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
BibTeX
Printed Edition
Laura M. Haas, Ashutosh Tiwary (Eds.):
SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2-4, 1998, Seattle, Washington, USA.
ACM Press 1998, ISBN 0-89791-995-5 BibTeX
,
SIGMOD Record 27(2),
June 1998
Contents
[Abstract]
References
- [1]
- ...
- [2]
- ...
- [3]
- ...
- [4]
- ...
- [5]
- ...
- [6]
- Surajit Chaudhuri, Vivek R. Narasayya:
An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server.
VLDB 1997: 146-155 BibTeX
- [7]
- Sheldon J. Finkelstein, Mario Schkolnick, Paolo Tiberio:
Physical Database Design for Relational Databases.
ACM Trans. Database Syst. 13(1): 91-128(1988) BibTeX
- [8]
- Phillip B. Gibbons, Yossi Matias, Viswanath Poosala:
Fast Incremental Maintenance of Approximate Histograms.
VLDB 1997: 466-475 BibTeX
- [9]
- ...
- [10]
- Peter J. Haas, Jeffrey F. Naughton, S. Seshadri, Lynne Stokes:
Sampling-Based Estimation of the Number of Distinct Values of an Attribute.
VLDB 1995: 311-322 BibTeX
- [11]
- Peter J. Haas, Arun N. Swami:
Sequential Sampling Procedures for Query Size Estimation.
SIGMOD Conference 1992: 341-350 BibTeX
- [12]
- Wen-Chi Hou, Gultekin Özsoyoglu, Erdogan Dogdu:
Error-Constraint COUNT Query Evaluation in Relational Databases.
SIGMOD Conference 1991: 278-287 BibTeX
- [13]
- Wen-Chi Hou, Gultekin Özsoyoglu, Baldeo K. Taneja:
Statistical Estimators for Relational Algebra Expressions.
PODS 1988: 276-287 BibTeX
- [14]
- Wen-Chi Hou, Gultekin Özsoyoglu, Baldeo K. Taneja:
Processing Aggregate Relational Queries with Hard Time Constraints.
SIGMOD Conference 1989: 68-77 BibTeX
- [15]
- Yannis E. Ioannidis, Viswanath Poosala:
Balancing Histogram Optimality and Practicality for Query Result Size Estimation.
SIGMOD Conference 1995: 233-244 BibTeX
- [16]
- Yannis E. Ioannidis, Viswanath Poosala:
Histogram-Based Solutions to Diverse Database Estimation Problems.
IEEE Data Eng. Bull. 18(3): 10-18(1995) BibTeX
- [17]
- Yibei Ling, Wei Sun:
An Evaluation of Sampling-Based Size Estimation Methods for Selections in Database Systems.
ICDE 1995: 532-539 BibTeX
- [18]
- Richard J. Lipton, Jeffrey F. Naughton:
Query Size Estimation by Adaptive Sampling.
PODS 1990: 40-46 BibTeX
- [19]
- Richard J. Lipton, Jeffrey F. Naughton, Donovan A. Schneider:
Practical Selectivity Estimation through Adaptive Sampling.
SIGMOD Conference 1990: 1-11 BibTeX
- [20]
- Richard J. Lipton, Jeffrey F. Naughton, Donovan A. Schneider, S. Seshadri:
Efficient Sampling Strategies for Relational Database Operations.
Theor. Comput. Sci. 116(1&2): 195-226(1993) BibTeX
- [21]
- Rajeev Motwani, Prabhakar Raghavan:
Randomized Algorithms.
Cambridge University Press 1995, ISBN 0-521-47465-5
BibTeX
- [22]
- Jeffrey F. Naughton, S. Seshadri:
On Estimating the Size of Projections.
ICDT 1990: 499-513 BibTeX
- [23]
- ...
- [24]
- ...
- [25]
- Gultekin Özsoyoglu, Kaizheng Du, A. Tjahjana, Wen-Chi Hou, D. Y. Rowland:
On Estimating COUNT, SUM, and AVERAGE.
DEXA 1991: 406-412 BibTeX
- [26]
- Viswanath Poosala, Yannis E. Ioannidis, Peter J. Haas, Eugene J. Shekita:
Improved Histograms for Selectivity Estimation of Range Predicates.
SIGMOD Conference 1996: 294-305 BibTeX
- [27]
- Gregory Piatetsky-Shapiro, Charles Connell:
Accurate Estimation of the Number of Tuples Satisfying a Condition.
SIGMOD Conference 1984: 256-276 BibTeX
- [28]
- Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, Thomas G. Price:
Access Path Selection in a Relational Database Management System.
SIGMOD Conference 1979: 23-34 BibTeX
- [29]
- George Kingsley Zipf:
Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology.
Addison-Wesley 1949
BibTeX
Referenced by
- Dimitrios Gunopulos, George Kollios, Vassilis J. Tsotras, Carlotta Domeniconi:
Approximating Multi-Dimensional Aggregate Range Queries over Real Attributes.
SIGMOD Conference 2000: 463-474
- Moses Charikar, Surajit Chaudhuri, Rajeev Motwani, Vivek R. Narasayya:
Towards Estimation Error Guarantees for Distinct Values.
PODS 2000: 268-279
- Surajit Chaudhuri, Rajeev Motwani:
On Sampling and Relational Operators.
IEEE Data Eng. Bull. 22(4): 41-46(1999)
- Vijayshankar Raman, Bhaskaran Raman, Joseph M. Hellerstein:
Online Dynamic Reordering for Interactive Data Processing.
VLDB 1999: 709-720
- Arnd Christian König, Gerhard Weikum:
Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation.
VLDB 1999: 423-434
- Aristides Gionis, Piotr Indyk, Rajeev Motwani:
Similarity Search in High Dimensions via Hashing.
VLDB 1999: 518-529
- Donko Donjerkovic, Raghu Ramakrishnan:
Probabilistic Optimization of Top N Queries.
VLDB 1999: 411-422
- Gurmeet Singh Manku, Sridhar Rajagopalan, Bruce G. Lindsay:
Random Sampling Techniques for Space Efficient Online Computation of Order Statistics of Large Datasets.
SIGMOD Conference 1999: 251-262
- Surajit Chaudhuri, Rajeev Motwani, Vivek R. Narasayya:
On Random Sampling over Joins.
SIGMOD Conference 1999: 263-274
- Björn Blohsfeld, Dieter Korus, Bernhard Seeger:
A Comparison of Selectivity Estimators for Range Queries on Metric Attributes.
SIGMOD Conference 1999: 239-250
- Surajit Chaudhuri, Vivek R. Narasayya:
Index Merging.
ICDE 1999: 296-303
- Surajit Chaudhuri, Vivek R. Narasayya:
AutoAdmin 'What-if' Index Analysis Utility.
SIGMOD Conference 1998: 367-378
- Surajit Chaudhuri:
An Overview of Query Optimization in Relational Systems.
PODS 1998: 34-43
BibTeX
ACM SIGMOD Anthology - DBLP:
[Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:40:44 2009