Approximate Computation of Multidimensional Aggregates Using Wavelets
Jeffrey Scott Vitter |       | Min Wang* |
Department of Computer Science, Duke University |       | Department of Computer Science, Duke University |
jsv@cs.duke.edu |       | minw@cs.duke.edu |
In this paper, we present a novel method that provides approximate answers to high-dimensional OLAP aggregation queries in a time and space efficient manner. We construct a \textit{compact data cube}, which is an approximate and space-efficient representation of the underlying mutidimenesional array, based upon multiresolution wavelet decomposition. In the online phase, based on the compact data cube, each aggregation query can generally be answered, depending upon the accuracy supported, in one I/O or a small number of I/Os.
We present an I/O-efficient algorithm to construct our compact data cube for both dense and sparse multidimensional arrays. Current histogram methods are too inefficient to run when the dimensionality is large. Our on-line query processing algorithm is very efficient and capable of refining answers as the user demands more accuracy. Experiments on real data show that our method provides significantly more accurate results for typical OLAP aggregation queries than other efficient approximation techniques such as random sampling.