DiSC - Bottom-Up Computation of Sparse and Iceberg CUBEs

Digital Symposium Collection 2000

Bottom-Up Computation of Sparse and Iceberg CUBEs

Kevin S. Beyer and Raghu Ramakrishnan
View Paper (PDF)

Return to Datacubes and Data Warehouses

Abstract

We introduce the Iceberg-CUBE problem as a reformulation of the datacube (CUBE) problem. The Iceberg-CUBE problem is to compute only those group-by partitions with an aggregate value (e.g., count) above some minimum support threshold. The result of Iceberg-CUBE can be used (1) to answer group-by queries with a clause such as HAVING COUNT(*) >= X, where X is greater than the threshold, (2) for mining multidimensional association rules, and (3) to complement existing strategies for identifying interesting subsets of the CUBE for precomputation. We present a new algorithm (BUC) for Iceberg-CUBE computation. BUC builds the CUBE bottom-up; i.e., it builds the CUBE by starting from a group-by on a single attribute, then a group-by on a pair of attributes, then a group-by on three attributes, and so on. This is the opposite of all techniques proposed earlier for computing the CUBE, and has an important practical advantage: BUC avoids computing the larger group-bys that do not meet minimum support. The pruning in BUC is similar to the pruning in the Apriori algorithm for association rules, except that BUC trades some pruning for locality of reference and reduced memory requirements. BUC uses the same pruning strategy when computing sparse, complete CUBEs. We present a thorough performance evaluation over a broad range of workloads. Our evaluation demonstrates that (in contrast to earlier assumptions) minimizing the aggregations or the number of sorts is not the most important aspect of the sparse CUBE problem. The pruning in BUC, combined with an efficient sort method, enables BUC to outperform all previous algorithms for sparse CUBEs, even for computing entire CUBEs, and to dramatically improve Iceberg-CUBE computation.

References

Note: References link to DBLP on the Web.

[1]: Sameet Agarwal , Rakesh Agrawal , Prasad Deshpande , Ashish Gupta , Jeffrey F. Naughton , Raghu Ramakrishnan , Sunita Sarawagi : On the Computation of Multidimensional Aggregates. VLDB 1996 : 506-521
[2]: Rakesh Agrawal , Ramakrishnan Srikant : Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994 : 487-499
[3]: Elena Baralis , Stefano Paraboschi , Ernest Teniente : Materialized Views Selection in a Multidimensional Database. VLDB 1997 : 156-165
[4]: ...
[5]: Min Fang , Narayanan Shivakumar , Hector Garcia-Molina , Rajeev Motwani , Jeffrey D. Ullman : Computing Iceberg Queries Efficiently. VLDB 1998 : 299-310
[6]: Jim Gray , Adam Bosworth , Andrew Layman , Hamid Pirahesh : Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total. ICDE 1996 : 152-159
[7]: Himanshu Gupta : Selection of Views to Materialize in a Data Warehouse. ICDT 1997 : 98-112
[8]: Himanshu Gupta , Venky Harinarayan , Anand Rajaraman , Jeffrey D. Ullman : Index Selection for OLAP. ICDE 1997 : 208-219
[9]: ...
[10]: Venky Harinarayan , Anand Rajaraman , Jeffrey D. Ullman : Implementing Data Cubes Efficiently. SIGMOD Conf. 1996 : 205-216
[11]: Micheline Kamber , Jiawei Han , Jenny Chiang : Metarule-Guided Mining of Multi-Dimensional Association Rules Using Data Cubes. KDD 1997 : 207-210
[12]: Ralph Kimball : The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses. John Wiley 1996, ISBN 0-471-15337-0
[13]: Raymond T. Ng , Laks V. S. Lakshmanan , Jiawei Han , Alex Pang : Exploratory Mining and Pruning Optimizations of Constrained Association Rules. SIGMOD Conference 1998 : 13-24
[14]: Kenneth A. Ross , Divesh Srivastava : Fast Computation of Sparse Datacubes. VLDB 1997 : 116-125
[15]: ...
[16]: ...
[17]: ...
[18]: Amit Shukla , Prasad Deshpande , Jeffrey F. Naughton : Materialized View Selection for Multidimensional Datasets. VLDB 1998 : 488-499
[19]: Yihong Zhao , Prasad Deshpande , Jeffrey F. Naughton : An Array-Based Algorithm for Simultaneous Multidimensional Aggregates. SIGMOD Conference 1997 : 159-170

Referenced by

Jiawei Han : Review - An Array-Based Algorithm for Simultaneous Multidimensional Aggregates. ACM SIGMOD Digital Review 1 : (1999)

BIBTEX

@inproceedings{DBLP:conf/sigmod/BeyerR99,
  author    = {Kevin S. Beyer and
                Raghu Ramakrishnan},
   editor    = {Alex Delis and
                Christos Faloutsos and
                Shahram Ghandeharizadeh},
   title     = {Bottom-Up Computation of Sparse and Iceberg CUBEs},
   booktitle = {SIGMOD 1999, Proceedings ACM SIGMOD International Conference
                on Management of Data, June 1-3, 1999, Philadephia, Pennsylvania,
                USA},
   publisher = {ACM Press},
   year      = {1999},
   isbn      = {1-58113-084-8},
   pages     = {359-370},
   crossref  = {DBLP:conf/sigmod/99},
   bibsource = {DBLP, http://dblp.uni-trier.de} } },

Copyright(C) 2000 ACM