Wavelet-Based Histograms for Selectivity Estimation
Yossi Matias (Tel Aviv University)
Jeffrey Scott Vitter (Duke University)
Min Wang (Duke University)
Query optimization is an integral part of relational database
management systems. One important task in query optimization is
selectivity estimation, that is, given a query P, we need to
estimate the fraction of records in the database that satisfy P.
Many commercial database systems maintain histograms to approximate
the frequency distribution of values in the attributes of relations.
In this paper, we present a technique based upon a multiresolution
wavelet decomposition for building histograms on the underlying data
distributions, with applications to databases, statistics, and
simulation. Histograms built on the cumulative data distributions give very
good approximations with limited space usage. We give fast algorithms
for constructing histograms and using them in an on-line fashion for
selectivity estimation. Our histograms also provide quick approximate
answers to OLAP queries when the exact answers are not required. Our
method captures the joint distribution of multiple attributes
effectively, even when the attributes are correlated. Experiments
confirm that our histograms offer substantial improvements in accuracy
over random sampling and other previous approaches.