Digital Symposium Collection 2000  

 
 
 
 
 
 

 





















A Comparison of Selectivity Estimators for Range Queries on Metric Attributes

Bjorn Blohsfeld, Dieter Korus, and Bernhard Seeger

  View Paper (PDF)  

Return to Selectivities

Note: The quality of the PDF contained herein reflects that of the material supplied to the DiSC'00 Production Team.

Abstract
In this paper, we present a comparison of nonparametric estimation methods for computing approximations of the selectivities of queries, in particular range queries. In contrast to previous studies, the focus of our comparison is on metric attributes with large domains which occur for example in spatial and temporal databases. We also assume that only small sample sets of the required relations are available for estimating the selectivity. In addition to the popular histogram estimators, our comparison includes so-called kernel estimation methods. Although these methods have been proven to be among the most accurate estimators known in statistics, they have not been considered for selectivity estimation of database queries, so far. We first show how to generate kernel estimators that deliver accurate approximate selectivities of queries. Thereafter, we reveal that two parameters, the number of samples and the so-called smoothing parameter, are important for the accuracy of both kernel estimators and histogram estimators. For histogram estimators, the smoothing parameter determines the number of bins (histogram classes). We first present the optimal smoothing parameter as a function of the number of samples and show how to compute approximations of the optimal parameter. Moreover, we propose a new selectivity estimator that can be viewed as an hybrid of histogram and kernel estimators. Experimental results show the performance of different estimators in practice. We found in our experiments that kernel estimators are most efficient for continuously distributed data sets, whereas for our real data sets the hybrid technique is most promising.


References

Note: References link to DBLP on the Web.

[1]
Chungmin Melvin Chen , Nick Roussopoulos : Adaptive Selectivity Estimation Using Query Feedback. SIGMOD Conference 1994 : 161-172
[2]
Yannis E. Ioannidis , Stavros Christodoulakis : Optimal Histograms for Limiting Worst-Case Error Propagation in the Size of Join Results. TODS 18(4) : 709-748(1993)
[3]
Gregory Piatetsky-Shapiro , Charles Connell : Accurate Estimation of the Number of Tuples Satisfying a Condition. SIGMOD Conference 1984 : 256-276
[4]
Yossi Matias , Jeffrey Scott Vitter , Min Wang : Wavelet-Based Histograms for Selectivity Estimation. SIGMOD Conference 1998 : 448-459
[5]
Surajit Chaudhuri , Rajeev Motwani , Vivek R. Narasayya : Random Sampling for Histogram Construction: How much is enough? SIGMOD Conference 1998 : 436-447
[6]
Joseph M. Hellerstein , Peter J. Haas , Helen Wang : Online Aggregation. SIGMOD Conference 1997 : 171-182
[7]
H. V. Jagadish , Nick Koudas , S. Muthukrishnan , Viswanath Poosala , Kenneth C. Sevcik , Torsten Suel : Optimal Histograms with Quality Guarantees. VLDB 1998 : 275-286
[8]
Viswanath Poosala , Yannis E. Ioannidis , Peter J. Haas , Eugene J. Shekita : Improved Histograms for Selectivity Estimation of Range Predicates. SIGMOD Conf. 1996 : 294-305
[9]
Michael V. Mannino , Paicheng Chu , Thomas Sager : Statistical Profile Estimation in Database Systems. Computing Surveys 20(3) : 191-221(1988)
[10]
...
[11]
...
[12]
Patricia G. Selinger , Morton M. Astrahan , Donald D. Chamberlin , Raymond A. Lorie , Thomas G. Price : Access Path Selection in a Relational Database Management System. SIGMOD Conference 1979 : 23-34
[13]
...
[14]
...
[15]
...
[16]
...
[17]
...
[18]
...
[19]
...

BIBTEX

@inproceedings{DBLP:conf/sigmod/BlohsfeldKS99,
  author    = {Bjorn Blohsfeld and
                Dieter Korus and
                Bernhard Seeger},
   editor    = {Alex Delis and
                Christos Faloutsos and
                Shahram Ghandeharizadeh},
   title     = {A Comparison of Selectivity Estimators for Range Queries on Metric
                Attributes},
   booktitle = {SIGMOD 1999, Proceedings ACM SIGMOD International Conference
                on Management of Data, June 1-3, 1999, Philadephia, Pennsylvania,
                USA},
   publisher = {ACM Press},
   year      = {1999},
   isbn      = {1-58113-084-8},
   pages     = {239-250},
   crossref  = {DBLP:conf/sigmod/99},
   bibsource = {DBLP, http://dblp.uni-trier.de} } },


























Copyright(C) 2000 ACM