Digital Symposium Collection 2000  

 
 
 
 
 
 

 





















Generalised Hash Teams for Join and Group-by

Alfons Kemper, Donald Kossmann, and Christian Wiesner

  View Paper (PDF)  

Return to Aggregation, Approximation & Explanation

Abstract
We propose a new class of algorithms that can be used to speed up the execution of multi-way join queries or of queries that involve one or more joins and a group-by. These new evaluation techniques allow to perform several hash-based operations (join and grouping) in one pass without repartitioning intermediate results. These techniques work particularly well for joining hierarchical structures, e.g., for evaluating functional join chains along key/foreign-key relationships. The idea is to generalize the concept of hash teams as proposed by Graefe et.al [GBC98] by indirectly partitioning the input data. Indirect partitioning means to partition the input data on an attribute that is not directly needed for the next hash-based operation, and it involves the construction of bitmaps to approximate the partitioning for the attribute that is needed in the next hash-based operation. Our performance experiments show that such generalized hash teams perform significantly better than conventional strategies for many common classes of decision support queries.


References

Note: References link to DBLP on the Web.

[Bab79]
Edward Babb : Implementing a Relational Database by Means of Specialized Hardware. TODS 4(1) : 1-29(1979)
[Blo70]
Burton H. Bloom : Space/Time Trade-offs in Hash Coding with Allowable Errors. CACM 13(7) : 422-426(1970)
[Bra84]
Kjell Bratbergsengen : Hashing Methods and Relational Algebra Operations. VLDB 1984 : 323-333
[Car75]
Alfonso F. Cardenas : Analysis and Performance of Inverted Data Base Structures. CACM 18(5) : 253-263(1975)
[CI98]
Chee Yong Chan , Yannis E. Ioannidis : Bitmap Index Design and Evaluation. SIGMOD Conference 1998 : 355-366
[CKK98]
...
[CM95]
Sophie Cluet , Guido Moerkotte : Efficient Evaluation of Aggregates on Bulk Types. DBPL 1995 : 8
[CS89]
Walter W. Chang , Hans-Jörg Schek : A Signature Access Method for the Starburst Database System. VLDB 1989 : 145-153
[CS94]
Surajit Chaudhuri , Kyuseok Shim : Including Group-By in Query Optimization. VLDB 1994 : 354-366
[GBC98]
Goetz Graefe , Ross Bunker , Shaun Cooper : Hash Joins and Hash Teams in Microsoft SQL Server. VLDB 1998 : 86-97
[GO95]
Patrick E. O'Neil , Goetz Graefe : Multi-Table Joins Through Bitmapped Join Indices. SIGMOD Record 24(3) : 8-11(1995)
[Gra93]
Goetz Graefe : Query Evaluation Techniques for Large Databases. Computing Surveys 25(2) : 73-170(1993)
[Gra94]
Goetz Graefe : Sort-Merge-Join: An Idea Whose Time Has(h) Passed? ICDE 1994 : 406-417
[GSE+94]
Jim Gray , Prakash Sundaresan , Susanne Englert , Kenneth Baclawski , Peter J. Weinberger : Quickly Generating Billion-Record Synthetic Databases. SIGMOD Conference 1994 : 243-252
[HCLS97]
Laura M. Haas , Michael J. Carey , Miron Livny , Amit Shukla : Seeking the Truth About ad hoc Join Costs. VLDB Journal 6(3) : 241-256(1997)
[HM97]
Sven Helmer , Guido Moerkotte : Evaluation of Main Memory Join Algorithms for Joins with Set Comparison Join Predicates. VLDB 1997 : 386-395
[HR96]
Evan P. Harris , Kotagiri Ramamohanarao : Join Algorithm Costs Revisited. VLDB Journal 5(1) : 64-84(1996)
[HWM98]
Sven Helmer , Till Westmann , Guido Moerkotte : Diag-Join: An Opportunistic Join Algorithm for 1:N Relationships. VLDB 1998 : 98-109
[Lar97]
...
[O'N87]
Patrick E. O'Neil : Model 204 Architecture and Performance. HPTS 1987 : 40-59
[OQ97]
Patrick E. O'Neil , Dallan Quass : Improved Query Performance with Variant Indexes. SIGMOD Conference 1997 : 38-49
[RRS91]
...
[Sha86]
Leonard D. Shapiro : Join Processing in Database Systems with Large Main Memories. TODS 11(3) : 239-264(1986)
[TPC95]
...
[Ull89]
Jeffrey D. Ullman : Principles of Database and Knowledge-Base Systems, Volume II. Computer Science Press 1989, ISBN 0-7167-8162-X
Contents
[VG84]
Patrick Valduriez , Georges Gardarin : Join and Semijoin Algorithms for a Multiprocessor Database Machine. TODS 9(1) : 133-161(1984)
[WB98]
Ming-Chuan Wu , Alejandro P. Buchmann : Encoded Bitmap Indexing for Data Warehouses. ICDE 1998 : 220-230
[VL94]
Weipeng P. Yan , Per-Åke Larson : Performing Group-By before Join. ICDE 1994 : 89-100

BIBTEX

@inproceedings{DBLP:conf/vldb/KemperKW99,
  author    = {Alfons Kemper and
                Donald Kossmann and
                Christian Wiesner},
   editor    = {Malcolm P. Atkinson and
                Maria E. Orlowska and
                Patrick Valduriez and
                Stanley B. Zdonik and
                Michael L. Brodie},
   title     = {Generalised Hash Teams for Join and Group-by},
   booktitle = {VLDB'99, Proceedings of 25th International Conference on Very
                Large Data Bases, September 7-10, 1999, Edinburgh, Scotland,
                UK},
   publisher = {Morgan Kaufmann},
   year      = {1999},
   isbn      = {1-55860-615-5},
   pages     = {30-41},
   crossref  = {DBLP:conf/vldb/99},
   bibsource = {DBLP, http://dblp.uni-trier.de} } },


























Copyright(C) 2000 ACM