ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC).

Masaru Kitsuregawa, Yasushi Ogawa: Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC). VLDB 1990: 210-221
@inproceedings{DBLP:conf/vldb/KitsuregawaO90,
  author    = {Masaru Kitsuregawa and
               Yasushi Ogawa},
  editor    = {Dennis McLeod and
               Ron Sacks-Davis and
               Hans-J{\"o}rg Schek},
  title     = {Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash
               Join Method for Data Skew in the Super Database Computer (SDC)},
  booktitle = {16th International Conference on Very Large Data Bases, August
               13-16, 1990, Brisbane, Queensland, Australia, Proceedings},
  publisher = {Morgan Kaufmann},
  year      = {1990},
  isbn      = {1-55860-149-X},
  pages     = {210-221},
  ee        = {db/conf/vldb/KitsuregawaO90.html},
  crossref  = {DBLP:conf/vldb/90},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

The Super Database Computer (SDC) is a high- performance relational database server for a join- intensive environment under development at Universityof Tokyo. SDC is designed to execute a join in a highly parallel way. Compared to other join algorithms, a hash-based algorithm is quite efficient and easily parallelized, and has been employed by many database machines. However, in the presence of data skew, it's hard to distribute load equally among processing modules (PMs) by statically allocating buckets to PMs, as in the conventional parallelizing strategy. Thus, performance is severly degraded.

In this paper, we propose a new parallel hash join method, the bucket spreading strategy, which is robust for data skew. During partitioning relations, each bucket is again divided into fragments of the same size and these fragments are temporarily placed on PMs one by one. Then each bucket is dynamically allocated to a PM which actually carries out the join of the bucket, and all fragments of the bucket are collected inthe corresponding PM. In this way, the bucket spreading strategy evenly distributes the load among the PMs and parallelism is always fully exploited. The architecture of SDC is designed to support the bucket spreading strategy; a mechanism which distributes the buckets flatly among the PMs is embedded in the hardware of the interconnection network. Simulation results confirm that the bucket spreading strategy is robust for data skew and attains very good scalability.

Copyright © 1990 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Dennis McLeod, Ron Sacks-Davis, Hans-Jörg Schek (Eds.): 16th International Conference on Very Large Data Bases, August 13-16, 1990, Brisbane, Queensland, Australia, Proceedings. Morgan Kaufmann 1990, ISBN 1-55860-149-X
BibTeX

References

[Bra84]
Kjell Bratbergsengen: Hashing Methods and Relational Algebra Operations. VLDB 1984: 323-333 BibTeX
[Bra89]
Kjell Bratbergsengen, Torgrim Gjelsvik: The Development of the CROSS8 and HC16-186 Parallel (Database) Computers. IWDM 1989: 359-372 BibTeX
[DeW84]
David J. DeWitt, Randy H. Katz, Frank Olken, Leonard D. Shapiro, Michael Stonebraker, David A. Wood: Implementation Techniques for Main Memory Database Systems. SIGMOD Conference 1984: 1-8 BibTeX
[DeW85]
David J. DeWitt, Robert H. Gerber: Multiprocessor Hash-Based Join Algorithms. VLDB 1985: 151-164 BibTeX
[DeW86]
David J. DeWitt, Robert H. Gerber, Goetz Graefe, Michael L. Heytens, Krishna B. Kumar, M. Muralikrishna: GAMMA - A High Performance Dataflow Database Machine. VLDB 1986: 228-237 BibTeX
[Ger86]
...
[Hir90]
...
[Kit83]
Masaru Kitsuregawa, Hidehiko Tanaka, Tohru Moto-Oka: Application of Hash to Data Base Machine and Its Architecture. New Generation Comput. 1(1): 63-74(1983) BibTeX
[Kit84]
...
[Kit87]
Masaru Kitsuregawa, Miyuki Nakano, Lilian Harada, Mikio Takagi: Functional Disk System for Relational Database. ICDE 1987: 88-95 BibTeX
[Kit87b]
Masaru Kitsuregawa, Weikang Yang, T. Suzuki, Mikio Takagi: Design and Implementation of High Speed Pipeline Merge Sorter with Run Length Tuning Mechanism. IWDM 1987: 89-102 BibTeX
[Kit89]
Masaru Kitsuregawa, Miyuki Nakano, Mikio Takagi: Query Execution for Large Relations on Functional Disk Systems. ICDE 1989: 159-167 BibTeX
[Kit89b]
...
[Kit89c]
Masaru Kitsuregawa, Masaya Nakayama, Mikio Takagi: The Effect of Bucket Size Tuning in the Dynamic Hybrid GRACE Hash Join Method. VLDB 1989: 257-266 BibTeX
[Lak88]
M. Seetha Lakshmi, Philip S. Yu: Effect of Skew on Join Performance in Parallel Architectures. DPDS 1988: 107-120 BibTeX
[Lak89]
M. Seetha Lakshmi, Philip S. Yu: Limiting Factors of Join Performance on Parallel Processors. ICDE 1989: 488-496 BibTeX
[Law75]
...
[Mit89]
...
[Nak88]
Masaya Nakayama, Masaru Kitsuregawa, Mikio Takagi: Hash-Partitioned Join Method Using Dynamic Destaging Strategy. VLDB 1988: 468-478 BibTeX
[Nec88]
...
[Omi89]
Edward Omiecinski, Eileen Tien Lin: Hash-Based and Index-Based Join Algorithms for Cube and Ring Connected Multicomputers. IEEE Trans. Knowl. Data Eng. 1(3): 329-343(1989) BibTeX
[Rie78]
...
[Sch89]
Donovan A. Schneider, David J. DeWitt: A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment. SIGMOD Conference 1989: 110-121 BibTeX
[Sha86]
Leonard D. Shapiro: Join Processing in Database Systems with Large Main Memories. ACM Trans. Database Syst. 11(3): 239-264(1986) BibTeX
[Sto86]
Michael Stonebraker: The Case for Shared Nothing. IEEE Database Eng. Bull. 9(1): 4-9(1986) BibTeX
[Ter88]
...
[Tan88]
The Tandem Performance Group: A Benchmark of NonStop SQL on the Debit Credit Transaction (Invited Paper). SIGMOD Conference 1988: 337-341 BibTeX
[Tur87]
...
[Val84]
Patrick Valduriez, Georges Gardarin: Join and Semijoin Algorithms for a Multiprocessor Database Machine. ACM Trans. Database Syst. 9(1): 133-161(1984) BibTeX

Referenced by

  1. Seigo Muto, Masaru Kitsuregawa: A Dynamic Load Balancing Strategy for Parallel Datacube Computation. DOLAP 1999: 67-72
  2. Daniel F. Lieuwen: Parallelizing Loops in Database Programming Languages. ICDE 1998: 86-93
  3. Manish Mehta, David J. DeWitt: Data Placement in Shared-Nothing Parallel Database Systems. VLDB J. 6(1): 53-72(1997)
  4. Takayuki Tamura, Masaru Kitsuregawa: Implementation and Evaluation of the Bucket Flattening Omega Network of the Parallel Relational Database Server SDC-II. DASFAA 1997: 471-480
  5. Viswanath Poosala, Yannis E. Ioannidis: Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing. VLDB 1996: 448-459
  6. Luc Bouganim, Daniela Florescu, Patrick Valduriez: Dynamic Load Balancing in Hierarchical Parallel Database Systems. VLDB 1996: 436-447
  7. Chiang Lee, Zue-An Chang: Utilizing Page-Level Join Index for Optimization in Parallel Join Execution. IEEE Trans. Knowl. Data Eng. 7(6): 900-914(1995)
  8. Kien A. Hua, Chiang Lee, Chau M. Hua: Dynamic Load Balancing in Multicomputer Database Systems Using Partition Tuning. IEEE Trans. Knowl. Data Eng. 7(6): 968-983(1995)
  9. Kien A. Hua, Wallapak Tavanapong, Honesty C. Young: A Performance Evaluation of Load Balancing Techniques for Join Operations on Multicomputer Database Systems. ICDE 1995: 44-51
  10. Lilian Harada, Masaru Kitsuregawa: Dynamic Join Product Skew Handling for Hash-Joins in Shared-Nothing Database Systems. DASFAA 1995: 246-255
  11. X. Zhao, Roger G. Johnson, Nigel J. Martin: DBJ - A Dynamic Balancing Hash Join Algorithm in Multiprocessor Database Systems (Extented Abstract). EDBT 1994: 301-308
  12. Kien A. Hua, Yu-lung Lo, Honesty C. Young: Considering Data Skew Factor in Multi-Way Join Query Optimization for Parallel Execution. VLDB J. 2(3): 303-330(1993)
  13. Goetz Graefe: Query Evaluation Techniques for Large Databases. ACM Comput. Surv. 25(2): 73-170(1993)
  14. Ambuj Shatdal, Jeffrey F. Naughton: Using Shared Virtual Memory for Parallel Join Processing. SIGMOD Conference 1993: 119-128
  15. Chiang Lee, Zue-An Chang: Workload Balance and Page Access Scheduling For Parallel Joins In Shared-Nothing Systems. ICDE 1993: 411-418
  16. Priti Mishra, Margaret H. Eich: Join Processing in Relational Databases. ACM Comput. Surv. 24(1): 63-113(1992)
  17. David J. DeWitt, Jim Gray: Parallel Database Systems: The Future of High Performance Database Systems. Commun. ACM 35(6): 85-98(1992)
  18. David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider, S. Seshadri: Practical Skew Handling in Parallel Joins. VLDB 1992: 27-40
  19. Hongjun Lu, Kian-Lee Tan: Dynamic and Load-balanced Task-Oriented Datbase Query Processing in Parallel Systems. EDBT 1992: 357-372
  20. Edward Omiecinski: Performance Analysis of a Load Balancing Hash-Join Algorithm for a Shared Memory Multiprocessor. VLDB 1991: 375-385
  21. Hongjun Lu, Ming-Chien Shan, Kian-Lee Tan: Optimization of Multi-Way Join Queries for Parallel Execution. VLDB 1991: 549-560
  22. Kien A. Hua, Chiang Lee: Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning. VLDB 1991: 525-535
  23. Masaru Kitsuregawa, Miyuki Nakano, Mikio Takagi: Performance Evaluation of Functional Disk System (FDS-R2). ICDE 1991: 416-425
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:45:43 2009