Bottom-Up Cube Computation for Sparse Cubes and Cube Subsets

Kevin Beyer*       Raghu Ramakrishnan
University of Wisconsin - Madison       University of Wisconsin - Madison
beyer@cs.wisc.edu       raghu@cs.wisc.edu

Abstract

We introduce the minsup-cube problem as a reformulation of the datacube (CUBE) problem. The new minsup-cube problem is to compute only those group-bys with a count above some minimum support threshold. The result of minsup-cube can be used (1) to answer queries over any group-by that use HAVING COUNT(*) > X, where X is greater than the threshold, and (2) for mining multidimensional association rules.

We present a new algorithm (BUC) for minsup-cube computation. BUC builds the CUBE bottom-up; i.e., it builds the CUBE by starting from a group-by on a single attribute, then a group-by on a pair of attributes, then a group-by on three attributes, and so on. This is the opposite of all techniques proposed earlier for computing the CUBE, and has an important practical advantage: BUC can avoid computing the larger group-bys that do not meet minimum support. The pruning in BUC is similar to the pruning in the Apriori algorithm for association rules, except that BUC trades some pruning for locality of reference and reduced memory requirements. BUC uses the same pruning strategy when computing sparse, complete CUBEs.

We present a thorough performance evaluation over a broad range of workloads. Our evaluation demonstrates that (in contrast to earlier assumptions) minimizing the aggregations or the number of sorts is not the most important aspect of the sparse CUBE problem. The pruning in BUC, combined with an efficient sort method, enables BUC to outperform all previous algorithms for sparse CUBEs, even for computing entire CUBEs, and to dramatically improve minsup-cube computation.