Approximate Computation of Multidimensional Aggregates Using Wavelets

Jeffrey Scott Vitter       Min Wang*
Department of Computer Science, Duke University       Department of Computer Science, Duke University
jsv@cs.duke.edu       minw@cs.duke.edu

Abstract

Computing multidimensional aggregates across many sets of dimensions is a performance bottleneck for most OLAP applications. In many situations, obtaining the exact answer to an aggregation query is prohibitively expensive in terms of time and/or storage space in data warehouse environment. It is advantageous to have fast, approximate answers to OLAP aggregation queries.

In this paper, we present a novel method that provides approximate answers to high-dimensional OLAP aggregation queries in a time and space efficient manner. We construct a \textit{compact data cube}, which is an approximate and space-efficient representation of the underlying mutidimenesional array, based upon multiresolution wavelet decomposition. In the online phase, based on the compact data cube, each aggregation query can generally be answered, depending upon the accuracy supported, in one I/O or a small number of I/Os.

We present an I/O-efficient algorithm to construct our compact data cube for both dense and sparse multidimensional arrays. Current histogram methods are too inefficient to run when the dimensionality is large. Our on-line query processing algorithm is very efficient and capable of refining answers as the user demands more accuracy. Experiments on real data show that our method provides significantly more accurate results for typical OLAP aggregation queries than other efficient approximation techniques such as random sampling.