What can Hierarchies do for Data Warehouses?

H. V. Jagadish, Laks V. S. Lakshmanan, Divesh Srivastava: What can Hierarchies do for Data Warehouses? VLDB 1999: 530-541

@inproceedings{DBLP:conf/vldb/JagadishLS99,
  author    = {H. V. Jagadish and
               Laks V. S. Lakshmanan and
               Divesh Srivastava},
  editor    = {Malcolm P. Atkinson and
               Maria E. Orlowska and
               Patrick Valduriez and
               Stanley B. Zdonik and
               Michael L. Brodie},
  title     = {What can Hierarchies do for Data Warehouses?},
  booktitle = {VLDB'99, Proceedings of 25th International Conference on Very
               Large Data Bases, September 7-10, 1999, Edinburgh, Scotland,
               UK},
  publisher = {Morgan Kaufmann},
  year      = {1999},
  isbn      = {1-55860-615-7},
  pages     = {530-541},
  ee        = {db/conf/vldb/JagadishLS99.html},
  crossref  = {DBLP:conf/vldb/99},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

BibTeX

Abstract

Data in a warehouse typically has multiple dimensions of interest, such as location, time, and product. It is well-recognized that these dimensions have hierarchies defined on them, such as ``store-city-state-region'' for location. The standard way to model such data is with a star/snowflake schema. However, current approaches do not give a first-class status to dimensions. Consequently, a substantial class of interesting queries involving dimension hierarchies and their interaction with the fact tables are quite verbose to write, hard to read, and difficult to optimize.

We propose the SQL(H) model and a natural extension to the SQL query language, that gives a first-class status to dimensions, and we pin down its semantics. Our model permits structural and schematic heterogeneity in dimension hierarchies, situations often arising in practice that cannot be modeled satisfactorily using the star/snowflake approach. We show using examples that sophisticated queries involving dimension hierarchies and their interplay with aggregation can be expressed concisely in SQL(H). By comparison, expressing such queries in SQL would involve a union of numerous complex sequences of joins. Finally, we develop an efficient implementation strategy for computing SQL queries, based on an algorithm for hierarchical joins, and the use of dimension indexes.

Copyright © 1999 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.

Online Paper

Download PDF file (www.vldb.org, Darmstadt, Germany)
Download PDF file (www.acm.org, New York, USA)

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Windows: Click the letter of your CD drive
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Mac: Click here
UNIX/LINUX: mount the DVD and click on the path of your mount point:
/Anthology/aDVD1 or /dvd

BibTeX

Printed Edition

Malcolm P. Atkinson, Maria E. Orlowska, Patrick Valduriez, Stanley B. Zdonik, Michael L. Brodie (Eds.): VLDB'99, Proceedings of 25th International Conference on Very Large Data Bases, September 7-10, 1999, Edinburgh, Scotland, UK. Morgan Kaufmann 1999, ISBN 1-55860-615-7
Contents BibTeX

References

[1]: Sameet Agarwal, Rakesh Agrawal, Prasad Deshpande, Ashish Gupta, Jeffrey F. Naughton, Raghu Ramakrishnan, Sunita Sarawagi: On the Computation of Multidimensional Aggregates. VLDB 1996: 506-521 BibTeX
[2]: Elena Baralis, Stefano Paraboschi, Ernest Teniente: Materialized Views Selection in a Multidimensional Database. VLDB 1997: 156-165 BibTeX
[3]: Luca Cabibbo, Riccardo Torlone: Querying Multidimensional Databases. DBPL 1997: 319-335 BibTeX
[4]: Chee Yong Chan, Yannis E. Ioannidis: Bitmap Index Design and Evaluation. SIGMOD Conference 1998: 355-366 BibTeX
[5]: Damianos Chatziantoniou, Kenneth A. Ross: Querying Multiple Features of Groups in Relational Databases. VLDB 1996: 295-306 BibTeX
[6]: Surajit Chaudhuri, Umeshwar Dayal: An Overview of Data Warehousing and OLAP Technology. SIGMOD Record 26(1): 65-74(1997) BibTeX
[7]: Douglas Comer: The Ubiquitous B-Tree. ACM Comput. Surv. 11(2): 121-137(1979) BibTeX
[8]: Jim Gray, Adam Bosworth, Andrew Layman, Hamid Pirahesh: Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total. ICDE 1996: 152-159 BibTeX
[9]: Venky Harinarayan, Anand Rajaraman, Jeffrey D. Ullman: Implementing Data Cubes Efficiently. SIGMOD Conference 1996: 205-216 BibTeX
[10]: Carlos A. Hurtado, Alberto O. Mendelzon, Alejandro A. Vaisman: Maintaining Data Cubes under Dimension Updates. ICDE 1999: 346-355 BibTeX
[11]: H. V. Jagadish, Laks V. S. Lakshmanan, Tova Milo, Divesh Srivastava, Dimitra Vista: Querying Network Directories. SIGMOD Conference 1999: 133-144 BibTeX
[12]: Ralph Kimball: The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses. John Wiley 1996, ISBN 0-471-15337-0
BibTeX
[13]: Laks V. S. Lakshmanan, Fereidoon Sadri, Iyer N. Subramanian: SchemaSQL - A Language for Interoperability in Relational Multi-Database Systems. VLDB 1996: 239-250 BibTeX
[14]: Wolfgang Lehner: Modelling Large Scale OLAP Scenarios. EDBT 1998: 153-167 BibTeX
[15]: Witold Litwin: Linear Hashing: A New Tool for File and Table Addressing. VLDB 1980: 212-223 BibTeX
[16]: Patrick E. O'Neil: Model 204 Architecture and Performance. HPTS 1987: 40-59 BibTeX
[17]: Patrick E. O'Neil, Goetz Graefe: Multi-Table Joins Through Bitmapped Join Indices. SIGMOD Record 24(3): 8-11(1995) BibTeX
[18]: Patrick E. O'Neil, Dallan Quass: Improved Query Performance with Variant Indexes. SIGMOD Conference 1997: 38-49 BibTeX
[19]: Kenneth A. Ross, Divesh Srivastava: Fast Computation of Sparse Datacubes. VLDB 1997: 116-125 BibTeX
[20]: Kenneth A. Ross, Divesh Srivastava, Damianos Chatziantoniou: Complex Aggregation at Multiple Granularities. EDBT 1998: 263-277 BibTeX
[21]: Patrick Valduriez: Join Indices. ACM Trans. Database Syst. 12(2): 218-246(1987) BibTeX
[22]: Jennifer Widom: Research Problems in Data Warehousing. CIKM 1995: 25-30 BibTeX
[23]: Yihong Zhao, Prasad Deshpande, Jeffrey F. Naughton: An Array-Based Algorithm for Simultaneous Multidimensional Aggregates. SIGMOD Conference 1997: 159-170 BibTeX

Referenced by

Carlos A. Hurtado, Alberto O. Mendelzon: Reasoning about Summarizability in Heterogeneous Multidimensional Schemas. ICDT 2001: 375-389
Theodore Johnson, Laks V. S. Lakshmanan, Raymond T. Ng: The 3W Model and Algebra for Unified Data Mining. VLDB 2000: 21-32
Nick Koudas, S. Muthukrishnan, Divesh Srivastava: Optimal Histograms for Hierarchical Range Queries. PODS 2000: 196-204

BibTeX

ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]

VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:46:28 2009