Fast Computation of Sparse Datacubes.
Kenneth A. Ross, Divesh Srivastava:
Fast Computation of Sparse Datacubes.
VLDB 1997: 116-125@inproceedings{DBLP:conf/vldb/RossS97,
author = {Kenneth A. Ross and
Divesh Srivastava},
editor = {Matthias Jarke and
Michael J. Carey and
Klaus R. Dittrich and
Frederick H. Lochovsky and
Pericles Loucopoulos and
Manfred A. Jeusfeld},
title = {Fast Computation of Sparse Datacubes},
booktitle = {VLDB'97, Proceedings of 23rd International Conference on Very
Large Data Bases, August 25-29, 1997, Athens, Greece},
publisher = {Morgan Kaufmann},
year = {1997},
isbn = {1-55860-470-7},
pages = {116-125},
ee = {db/conf/vldb/RossS97.html},
crossref = {DBLP:conf/vldb/97},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX
Abstract
Datacube queries compute aggregates over database relations at a
variety of granularities, and they constitute an important class of
decision support queries. Real-world data is frequently sparse, and
hence efficiently computing datacubes over large sparse relations is
important. We show that current techniques for computing datacubes
over sparse relations do not scale well with the number of CUBE BY
attributes, especially when the relation is much larger than main memory.
We propose a novel algorithm for the fast computation of datacubes
over sparse relations, and demonstrate the efficiency of our algorithm
using synthetic, benchmark and real-world data sets. When the
relation fits in memory, our technique performs multiple in-memory
sorts, and does not incur any I/O beyond the input of the relation and
the output of the datacube itself. When the relation does not fit in
memory, a divide-and-conquer strategy divides the problem of computing
the datacube into several simpler computations of sub-datacubes.
Often, all but one of the sub-datacubes can be computed in memory and
our in-memory solution applies. In that case, the total I/O overhead
is linear in the number of CUBE BY attributes. We demonstrate with an
implementation that the CPU cost of our algorithm is dominated by the
I/O cost for sparse relations.
Copyright © 1997 by the VLDB Endowment.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or
distributed for direct commercial advantage, the VLDB
copyright notice and the title of the publication and
its date appear, and notice is given that copying
is by the permission of the Very Large Data Base
Endowment. To copy otherwise, or to republish, requires
a fee and/or special permission from the Endowment.
Online Paper
CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ...
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
BibTeX
Printed Edition
Matthias Jarke, Michael J. Carey, Klaus R. Dittrich, Frederick H. Lochovsky, Pericles Loucopoulos, Manfred A. Jeusfeld (Eds.):
VLDB'97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25-29, 1997, Athens, Greece.
Morgan Kaufmann 1997, ISBN 1-55860-470-7
Contents BibTeX
Electronic Edition
From CS Dept.,
University Trier (Germany)
References
- [AAD+96]
- Sameet Agarwal, Rakesh Agrawal, Prasad Deshpande, Ashish Gupta, Jeffrey F. Naughton, Raghu Ramakrishnan, Sunita Sarawagi:
On the Computation of Multidimensional Aggregates.
VLDB 1996: 506-521 BibTeX
- [DANR96]
- ...
- [GBLP96]
- Jim Gray, Adam Bosworth, Andrew Layman, Hamid Pirahesh:
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total.
ICDE 1996: 152-159 BibTeX
- [Hoa62]
- C. A. R. Hoare:
Quicksort.
Comput. J. 5(1): 10-15(1962) BibTeX
- [HRU96]
- Venky Harinarayan, Anand Rajaraman, Jeffrey D. Ullman:
Implementing Data Cubes Efficiently.
SIGMOD Conference 1996: 205-216 BibTeX
- [HWL94]
- ...
- [SAG96]
- ...
- [Tra95]
- ...
- [ZDN97]
- Yihong Zhao, Prasad Deshpande, Jeffrey F. Naughton:
An Array-Based Algorithm for Simultaneous Multidimensional Aggregates.
SIGMOD Conference 1997: 159-170 BibTeX
Referenced by
- Frank K. H. A. Dehne, Todd Eavis, Susanne E. Hambrusch, Andrew Rau-Chaplin:
Parallelizing the Data Cube.
ICDT 2001: 129-143
- Weifa Liang, Maria E. Orlowska, Jeffrey Xu Yu:
Optimizing Multiple Dimensional Queries Simultaneously in Multidimensional Databases.
VLDB J. 8(3-4): 319-338(2000)
- Themistoklis Palpanas:
Knowledge Discovery in Data Warehouses.
SIGMOD Record 29(3): 88-100(2000)
- Jiawei Han:
Review - An Array-Based Algorithm for Simultaneous Multidimensional Aggregates.
ACM SIGMOD Digital Review 1: (1999)
- H. V. Jagadish, Laks V. S. Lakshmanan, Divesh Srivastava:
What can Hierarchies do for Data Warehouses?
VLDB 1999: 530-541
- Damianos Chatziantoniou:
Evaluation of Ad Hoc OLAP: In-Place Computation.
SSDBM 1999: 34-43
- Kevin S. Beyer, Raghu Ramakrishnan:
Bottom-Up Computation of Sparse and Iceberg CUBEs.
SIGMOD Conference 1999: 359-370
- Seigo Muto, Masaru Kitsuregawa:
A Dynamic Load Balancing Strategy for Parallel Datacube Computation.
DOLAP 1999: 67-72
- Shih-Fu Chang, Luis Gravano, Gail E. Kaiser, Kenneth A. Ross, Salvatore J. Stolfo:
Database Research at Columbia University.
SIGMOD Record 27(3): 75-80(1998)
- Amit Shukla, Prasad Deshpande, Jeffrey F. Naughton:
Materialized View Selection for Multidimensional Datasets.
VLDB 1998: 488-499
- John R. Smith, Chung-Sheng Li, Vittorio Castelli, Anant Jhingran:
Dynamic Assembly of Views in Data Cubes.
PODS 1998: 274-283
- Kenneth A. Ross, Divesh Srivastava, Damianos Chatziantoniou:
Complex Aggregation at Multiple Granularities.
EDBT 1998: 263-277
- Seigo Muto, Masaru Kitsuregawa:
Improving Main Memory Utilization for Array-Based DataCube Computation.
DOLAP 1998: 28-33
BibTeX
ACM SIGMOD Anthology - DBLP:
[Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:46:15 2009