Digital Symposium Collection 2000  

 
 
 
 
 
 

 





















Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets

Jeffrey Scott Vitter and Min Wang

  View Paper (PDF)  

Return to Histograms

Abstract
Computing multidimensional aggregates in high dimensions is a performance bottleneck for many OLAP applications. Obtaining the exact answer to an aggregation query can be prohibitively expensive in terms of time and/or storage space in a data warehouse environment. It is advantageous to have fast, approximate answers to OLAP aggregation queries. In this paper, we present a novel method that provides approximate answers to high-dimensional OLAP aggregation queries in massive sparse data sets in a time-efficient and space-efficient manner. We construct a compact data cube, which is an approximate and space-efficient representation of the underlying multidimensional array, based upon a multiresolution wavelet decomposition. In the on-line phase, each aggregation query can generally be answered using the compact data cube in one I/O or a small number of I/Os, depending upon the desired accuracy. We present two I/O-efficient algorithms to construct the compact data cube for the important case of sparse high-dimensional arrays, which often arise in practice. The traditional histogram methods are infeasible for the massive high-dimensional data sets in OLAP applications. Previously developed wavelet techniques are efficient only for dense data. Our on-line query processing algorithm is very fast and capable of refining answers as the user demands more accuracy. Experiments on real data show that our method provides significantly more accurate results for typical OLAP aggregation queries than other efficient approximation techniques such as random sampling.


References

Note: References link to DBLP on the Web.

[AAD+96]
Sameet Agarwal , Rakesh Agrawal , Prasad Deshpande , Ashish Gupta , Jeffrey F. Naughton , Raghu Ramakrishnan , Sunita Sarawagi : On the Computation of Multidimensional Aggregates. VLDB 1996 : 506-521
[AV88]
Alok Aggarwal , Jeffrey Scott Vitter : The Input/Output Complexity of Sorting and Related Problems. CACM 31(9) : 1116-1127(1988)
[BDF+97]
Daniel Barbará , William DuMouchel , Christos Faloutsos , Peter J. Haas , Joseph M. Hellerstein , Yannis E. Ioannidis , H. V. Jagadish , Theodore Johnson , Raymond T. Ng , Viswanath Poosala , Kenneth A. Ross , Kenneth C. Sevcik : The New Jersey Data Reduction Report. Data Engineering Bulletin 20(4) : 3-45(1997)
[Bur]
...
[FJS97]
Christos Faloutsos , H. V. Jagadish , Nikolaos Sidiropoulos : Recovering Information from Summary Data. VLDB 1997 : 36-45
[GBLP96]
Jim Gray , Adam Bosworth , Andrew Layman , Hamid Pirahesh : Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total. ICDE 1996 : 152-159
[GM98]
Phillip B. Gibbons , Yossi Matias : New Sampling-Based Summary Statistics for Improving Approximate Query Answers. SIGMOD Conference 1998 : 331-342
[HAMS97]
Ching-Tien Ho , Rakesh Agrawal , Nimrod Megiddo , Ramakrishnan Srikant : Range Queries in OLAP Data Cubes. SIGMOD Conference 1997 : 73-88
[HHW97]
Joseph M. Hellerstein , Peter J. Haas , Helen Wang : Online Aggregation. SIGMOD Conference 1997 : 171-182
[HRU96]
Venky Harinarayan , Anand Rajaraman , Jeffrey D. Ullman : Implementing Data Cubes Efficiently. SIGMOD Conf. 1996 : 205-216
[JS94]
...
[MVW98]
Yossi Matias , Jeffrey Scott Vitter , Min Wang : Wavelet-Based Histograms for Selectivity Estimation. SIGMOD Conference 1998 : 448-459
[PC98]
...
[PI96]
Viswanath Poosala , Yannis E. Ioannidis : Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing. VLDB 1996 : 448-459
[PI97]
Viswanath Poosala , Yannis E. Ioannidis : Selectivity Estimation Without the Attribute Value Independence Assumption. VLDB 1997 : 486-495
[PIHS96]
Viswanath Poosala , Yannis E. Ioannidis , Peter J. Haas , Eugene J. Shekita : Improved Histograms for Selectivity Estimation of Range Predicates. SIGMOD Conf. 1996 : 294-305
[SAC+79]
Patricia G. Selinger , Morton M. Astrahan , Donald D. Chamberlin , Raymond A. Lorie , Thomas G. Price : Access Path Selection in a Relational Database Management System. SIGMOD Conference 1979 : 23-34
[SDS96]
...
[Ven94]
...
[Ven97]
...
[Vit99]
...
[VS94]
Jeffrey Scott Vitter , Elizabeth A. M. Shriver : Algorithms for Parallel Memory I: Two-Level Memories. Algorithmica 12(2/3) : 110-147(1994)
[VV96]
...
[VWI98]
Jeffrey Scott Vitter , Min Wang , Balakrishna R. Iyer : Data Cube Approximation and Histograms via Wavelets. CIKM 1998 : 96-104
[ZDN97]
Yihong Zhao , Prasad Deshpande , Jeffrey F. Naughton : An Array-Based Algorithm for Simultaneous Multidimensional Aggregates. SIGMOD Conference 1997 : 159-170

Referenced by

  1. Chee Yong Chan , Yannis E. Ioannidis : Hierarchical Prefix Cubes for Range-Sum Queries. VLDB 1999 : 675-686
  2. Peter J. Haas : Techniques for Online Exploration of Large Object-Relational Datasets. SSDBM 1999 : 4-12

BIBTEX

@inproceedings{DBLP:conf/sigmod/VitterW99,
  author    = {Jeffrey Scott Vitter and
                Min Wang},
   editor    = {Alex Delis and
                Christos Faloutsos and
                Shahram Ghandeharizadeh},
   title     = {Approximate Computation of Multidimensional Aggregates of Sparse
                Data Using Wavelets},
   booktitle = {SIGMOD 1999, Proceedings ACM SIGMOD International Conference
                on Management of Data, June 1-3, 1999, Philadephia, Pennsylvania,
                USA},
   publisher = {ACM Press},
   year      = {1999},
   isbn      = {1-58113-084-8},
   pages     = {193-204},
   crossref  = {DBLP:conf/sigmod/99},
   bibsource = {DBLP, http://dblp.uni-trier.de} } },


























Copyright(C) 2000 ACM