ACM SIGMOD Anthology VLDB dblp.uni-trier.de

On the Computation of Multidimensional Aggregates.

Sameet Agarwal, Rakesh Agrawal, Prasad Deshpande, Ashish Gupta, Jeffrey F. Naughton, Raghu Ramakrishnan, Sunita Sarawagi: On the Computation of Multidimensional Aggregates. VLDB 1996: 506-521
@inproceedings{DBLP:conf/vldb/AgarwalADGNRS96,
  author    = {Sameet Agarwal and
               Rakesh Agrawal and
               Prasad Deshpande and
               Ashish Gupta and
               Jeffrey F. Naughton and
               Raghu Ramakrishnan and
               Sunita Sarawagi},
  editor    = {T. M. Vijayaraman and
               Alejandro P. Buchmann and
               C. Mohan and
               Nandlal L. Sarda},
  title     = {On the Computation of Multidimensional Aggregates},
  booktitle = {VLDB'96, Proceedings of 22th International Conference on Very
               Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India},
  publisher = {Morgan Kaufmann},
  year      = {1996},
  isbn      = {1-55860-382-4},
  pages     = {506-521},
  ee        = {db/conf/vldb/AgarwalADGNRS96.html},
  crossref  = {DBLP:conf/vldb/96},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

At the heart of all OLAP or multidimensional data analysis is the ability to simultaneously aggregate across many sets of dimensions. Computing multidimensional aggregates is a performance bottleneck for these applications. This paper presents fast algorithms for computing a collection of group-bys. We focus on a special case of the aggregation problem - computation of the CUBE operator. The CUBE operator requires computing group-bys on all possible combinations of a list of attributes, and is equivalent to the union of a number of standard group-by operations. We show how the structure of CUBE computations can be viewed in terms of a hierarchy of group-by operations. Our algorithms extend sort-based and hash-based grouping methods with several optimizations, like combining common operations across multiple group-bys, caching, and using pre-computed group-bys for computing other group-bys. Empirical evaluation shows that the resulting algorithms give much better performance compared to straightforward methods.

This paper combines work done concurrently on computing the data cube by two different teams as reported in [SAG96] and [DANR96].

Copyright © 1996 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

T. M. Vijayaraman, Alejandro P. Buchmann, C. Mohan, Nandlal L. Sarda (Eds.): VLDB'96, Proceedings of 22th International Conference on Very Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India. Morgan Kaufmann 1996, ISBN 1-55860-382-4
Contents BibTeX

Electronic Edition

References

[CM89]
Meng Chang Chen, Lawrence McNamee: On the Data Model and Access Method of Summary Data Management. IEEE Trans. Knowl. Data Eng. 1(4): 519-529(1989) BibTeX
[GBLP96]
Jim Gray, Adam Bosworth, Andrew Layman, Hamid Pirahesh: Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total. ICDE 1996: 152-159 BibTeX
[GJ79]
M. R. Garey, David S. Johnson: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman 1979, ISBN 0-7167-1044-7
BibTeX
[GLS94]
Goetz Graefe, Ann Linville, Leonard D. Shapiro: Sort versus Hash Revisited. IEEE Trans. Knowl. Data Eng. 6(6): 934-944(1994) BibTeX
[Gra93]
Goetz Graefe: Query Evaluation Techniques for Large Databases. ACM Comput. Surv. 25(2): 73-170(1993) BibTeX
[HNSS95]
Peter J. Haas, Jeffrey F. Naughton, S. Seshadri, Lynne Stokes: Sampling-Based Estimation of the Number of Distinct Values of an Attribute. VLDB 1995: 311-322 BibTeX
[JS96]
...
[PS82]
Christos H. Papadimitriou, Kenneth Steiglitz: Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall 1982, ISBN 0-13-152462-3
BibTeX
[FELL57]
...
[HRU96]
Venky Harinarayan, Anand Rajaraman, Jeffrey D. Ullman: Implementing Data Cubes Efficiently. SIGMOD Conference 1996: 205-216 BibTeX
[GHRU96]
...
[EPST79]
...
[SN95]
Ambuj Shatdal, Jeffrey F. Naughton: Adaptive Parallel Aggregation Algorithms. SIGMOD Conference 1995: 104-114 BibTeX
[SDNR96]
Amit Shukla, Prasad Deshpande, Jeffrey F. Naughton, Karthikeyan Ramasamy: Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies. VLDB 1996: 522-531 BibTeX
[SAG96]
...
[DANR96]
...
[Mic92]
Zbigniew Michalewicz: Statistical and Scientific Databases. Ellis Horwood 1992
BibTeX
[NC95]
...
[CODD93]
...
[FINK]
...
[Sho82]
Arie Shoshani: Statistical Databases: Characteristics, Problems, and some Solutions. VLDB 1982: 208-222 BibTeX
[SR96]
...
[STG95]
...
[STL89]
Jaideep Srivastava, Jack S. Eddy Tan, Vincent Y. Lum: TBSAM: An Access Method for Efficient Processing of Statistical Queries. IEEE Trans. Knowl. Data Eng. 1(4): 414-423(1989) BibTeX
[WELD95]
...

Referenced by

  1. Carlos A. Hurtado, Alberto O. Mendelzon: Reasoning about Summarizability in Heterogeneous Multidimensional Schemas. ICDT 2001: 375-389
  2. Frank K. H. A. Dehne, Todd Eavis, Susanne E. Hambrusch, Andrew Rau-Chaplin: Parallelizing the Data Cube. ICDT 2001: 129-143
  3. Weifa Liang, Maria E. Orlowska, Jeffrey Xu Yu: Optimizing Multiple Dimensional Queries Simultaneously in Multidimensional Databases. VLDB J. 8(3-4): 319-338(2000)
  4. Themistoklis Palpanas: Knowledge Discovery in Data Warehouses. SIGMOD Record 29(3): 88-100(2000)
  5. Stéphane Grumbach, Leonardo Tininini: On the Content of Materialized Aggregate Views. PODS 2000: 47-57
  6. Steven Geffner, Divyakant Agrawal, Amr El Abbadi: The Dynamic Data Cube. EDBT 2000: 237-253
  7. Prasad Deshpande, Jeffrey F. Naughton: Aggregate Aware Caching for Multi-Dimensional Queries. EDBT 2000: 167-182
  8. Peter A. Boncz, Martin L. Kersten: MIL Primitives for Querying a Fragmented World. VLDB J. 8(2): 101-119(1999)
  9. Jianzhong Li, Doron Rotem, Jaideep Srivastava: Aggregation Algorithms for Very Large Compressed Data Warehouses. VLDB 1999: 651-662
  10. H. V. Jagadish, Laks V. S. Lakshmanan, Divesh Srivastava: What can Hierarchies do for Data Warehouses? VLDB 1999: 530-541
  11. Damianos Chatziantoniou: Evaluation of Ad Hoc OLAP: In-Place Computation. SSDBM 1999: 34-43
  12. Jeffrey Scott Vitter, Min Wang: Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets. SIGMOD Conference 1999: 193-204
  13. Yannis Kotidis, Nick Roussopoulos: DynaMat: A Dynamic View Management System for Data Warehouses. SIGMOD Conference 1999: 371-382
  14. Kevin S. Beyer, Raghu Ramakrishnan: Bottom-Up Computation of Sparse and Iceberg CUBEs. SIGMOD Conference 1999: 359-370
  15. Carlos A. Hurtado, Alberto O. Mendelzon, Alejandro A. Vaisman: Maintaining Data Cubes under Dimension Updates. ICDE 1999: 346-355
  16. Seigo Muto, Masaru Kitsuregawa: A Dynamic Load Balancing Strategy for Parallel Datacube Computation. DOLAP 1999: 67-72
  17. Jiawei Han: Towards On-Line Analytical Mining in Large Databases. SIGMOD Record 27(1): 97-107(1998)
  18. Amit Shukla, Prasad Deshpande, Jeffrey F. Naughton: Materialized View Selection for Multidimensional Datasets. VLDB 1998: 488-499
  19. Frédéric Gingras, Laks V. S. Lakshmanan: nD-SQL: A Multi-Dimensional Language for Interoperability and OLAP. VLDB 1998: 134-145
  20. Yannis Kotidis, Nick Roussopoulos: An Alternative Storage Organization for ROLAP Aggregate Views Based on Cubetrees. SIGMOD Conference 1998: 249-258
  21. Prasad Deshpande, Karthikeyan Ramasamy, Amit Shukla, Jeffrey F. Naughton: Caching Multidimensional Queries Using Chunks. SIGMOD Conference 1998: 259-270
  22. John R. Smith, Chung-Sheng Li, Vittorio Castelli, Anant Jhingran: Dynamic Assembly of Views in Data Cubes. PODS 1998: 274-283
  23. Charu C. Aggarwal, Philip S. Yu: Online Generation of Association Rules. ICDE 1998: 402-411
  24. Shin-Chung Shao: Multivariate and Multidimensional OLAP. EDBT 1998: 120-134
  25. Sunita Sarawagi, Rakesh Agrawal, Nimrod Megiddo: Discovery-Driven Exploration of OLAP Data Cubes. EDBT 1998: 168-182
  26. Kenneth A. Ross, Divesh Srivastava, Damianos Chatziantoniou: Complex Aggregation at Multiple Granularities. EDBT 1998: 263-277
  27. Takeshi Fukuda, Hirofumi Matsuzawa: Parallel Processing of Multiple Aggregate Queries on Shared-Nothing Multiprocessors. EDBT 1998: 278-292
  28. Seigo Muto, Masaru Kitsuregawa: Improving Main Memory Utilization for Array-Based DataCube Computation. DOLAP 1998: 28-33
  29. Sunita Sarawagi: Indexing OLAP Data. IEEE Data Eng. Bull. 20(1): 36-43(1997)
  30. Venky Harinarayan: Issues in Interactive Aggregation. IEEE Data Eng. Bull. 20(1): 12-18(1997)
  31. Curtis E. Dyreson: Using an Incomplete Data Cube as a Summary Data Sieve. IEEE Data Eng. Bull. 20(1): 19-26(1997)
  32. Prasad Deshpande, Jeffrey F. Naughton, Karthikeyan Ramasamy, Amit Shukla, Kristin Tufte, Yihong Zhao: Cubing Algorithms, Storage Estimation, and Storage and Processing Alternatives for OLAP. IEEE Data Eng. Bull. 20(1): 3-11(1997)
  33. Kenneth A. Ross, Divesh Srivastava: Fast Computation of Sparse Datacubes. VLDB 1997: 116-125
  34. Marc Gyssens, Laks V. S. Lakshmanan: A Foundation for Multi-dimensional Databases. VLDB 1997: 106-115
  35. Elena Baralis, Stefano Paraboschi, Ernest Teniente: Materialized Views Selection in a Multidimensional Database. VLDB 1997: 156-165
  36. Yihong Zhao, Prasad Deshpande, Jeffrey F. Naughton: An Array-Based Algorithm for Simultaneous Multidimensional Aggregates. SIGMOD Conference 1997: 159-170
  37. Nick Roussopoulos, Yannis Kotidis, Mema Roussopoulos: Cubetree: Organization of and Bulk Updates on the Data Cube. SIGMOD Conference 1997: 89-99
  38. Inderpal Singh Mumick, Dallan Quass, Barinderpal Singh Mumick: Maintenance of Data Cubes and Summary Tables in a Warehouse. SIGMOD Conference 1997: 100-111
  39. Ching-Tien Ho, Rakesh Agrawal, Nimrod Megiddo, Ramakrishnan Srikant: Range Queries in OLAP Data Cubes. SIGMOD Conference 1997: 73-88
  40. Joseph M. Hellerstein, Peter J. Haas, Helen J. Wang: Online Aggregation. SIGMOD Conference 1997: 171-182
  41. Ching-Tien Ho, Jehoshua Bruck, Rakesh Agrawal: Partial-Sum Queries in Data Cubes Using Covering Codes. PODS 1997: 228-237
  42. Rakesh Agrawal, Ashish Gupta, Sunita Sarawagi: Modeling Multidimensional Databases. ICDE 1997: 232-243
  43. Luca Cabibbo, Riccardo Torlone: Querying Multidimensional Databases. DBPL 1997: 319-335
  44. Amit Shukla, Prasad Deshpande, Jeffrey F. Naughton, Karthikeyan Ramasamy: Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies. VLDB 1996: 522-531
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:46:12 2009