ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Practical Skew Handling in Parallel Joins.

David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider, S. Seshadri: Practical Skew Handling in Parallel Joins. VLDB 1992: 27-40
@inproceedings{DBLP:conf/vldb/DeWittNSS92,
  author    = {David J. DeWitt and
               Jeffrey F. Naughton and
               Donovan A. Schneider and
               S. Seshadri},
  editor    = {Li-Yan Yuan},
  title     = {Practical Skew Handling in Parallel Joins},
  booktitle = {18th International Conference on Very Large Data Bases, August
               23-27, 1992, Vancouver, Canada, Proceedings},
  publisher = {Morgan Kaufmann},
  year      = {1992},
  isbn      = {1-55860-151-1},
  pages     = {27-40},
  ee        = {db/conf/vldb/DeWittNSS92.html},
  crossref  = {DBLP:conf/vldb/92},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

We present an approach to dealing with skew in parallel joins in database systems. Our approach is easily implementable within current parallel DBMS, and performs well on skewed data without degrading the performance of the system on non-skewed data. The main idea is to use multiple algorithms, each specialized for a different degree of skew, and to use a small sample of the relations being joined to determine which algorithm is appropriate. We developed, implemented, and experimented with four new skew-handling parallel join algorithms; one, which we call virtual processor range partitioning, was the clear winner in high skew cases, while traditional hybrid hash joinwas the clear winner in lower skew or no skew cases. We present experimental results from an implementation of all four algorithms on the Gamma parallel database machine. To our knowledge, these are the first reported skew-handling numbers from an actual implementation.

Copyright © 1992 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Li-Yan Yuan (Ed.): 18th International Conference on Very Large Data Bases, August 23-27, 1992, Vancouver, Canada, Proceedings. Morgan Kaufmann 1992, ISBN 1-55860-151-1
Contents BibTeX

References

[BFKS87]
Chaitanya K. Baru, Ophir Frieder, Dilip D. Kandlur, Mark E. Segal: Join on a Cube: Analysis, Simulation, and Implementation. IWDM 1987: 61-74 BibTeX
[BGMP79]
Mike W. Blasgen, Jim Gray, Michael F. Mitoma, Thomas G. Price: The Convoy Phenomenon. Operating Systems Review 13(2): 20-25(1979) BibTeX
[Bra87]
Kjell Bratbergsengen: Algebra Operations on a Parallel Computer - Performance Evaluation. IWDM 1987: 415-428 BibTeX
[CDKK85]
Hong-Tai Chou, David J. DeWitt, Randy H. Katz, Anthony C. Klug: Design and Implementation of the Wisconsin Storage System. Softw., Pract. Exper. 15(10): 943-962(1985) BibTeX
[Coc77]
William G. Cochran: Sampling Techniques, 3rd Edition. John Wiley 1977, ISBN 0-471-16240-X
BibTeX
[CW79]
Larry Carter, Mark N. Wegman: Universal Classes of Hash Functions. J. Comput. Syst. Sci. 18(2): 143-154(1979) BibTeX
[DG85]
David J. DeWitt, Robert H. Gerber: Multiprocessor Hash-Based Join Algorithms. VLDB 1985: 151-164 BibTeX
[DG92]
David J. DeWitt, Jim Gray: Parallel Database Systems: The Future of High Performance Database Systems. Commun. ACM 35(6): 85-98(1992) BibTeX
[DGG++86]
David J. DeWitt, Robert H. Gerber, Goetz Graefe, Michael L. Heytens, Krishna B. Kumar, M. Muralikrishna: GAMMA - A High Performance Dataflow Database Machine. VLDB 1986: 228-237 BibTeX
[DGS88]
David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider: A Performance Analysis of the Gamma Database Machine. SIGMOD Conference 1988: 350-360 BibTeX
[DGS+90]
David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider, Allan Bricker, Hui-I Hsiao, Rick Rasmussen: The Gamma Database Machine Project. IEEE Trans. Knowl. Data Eng. 2(1): 44-62(1990) BibTeX
[DNS91a]
David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider: An Evaluation of Non-Equijoin Algorithms. VLDB 1991: 443-452 BibTeX
[DNS91b]
David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider: Parallel Sorting on a Shared-Nothing Architecture using Probabilistic Splitting. PDIS 1991: 280-291 BibTeX
[ESW78]
Robert S. Epstein, Michael Stonebraker, Eugene Wong: Distributed Query Processing in a Relational Data Base System. SIGMOD Conference 1978: 169-180 BibTeX
[Gra69]
Ronald L. Graham: Bounds on Multiprocessing Timing Anomalies. SIAM Journal of Applied Mathematics 17(2): 416-429(1969) BibTeX
[HL91]
Kien A. Hua, Chiang Lee: Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning. VLDB 1991: 525-535 BibTeX
[KO90]
Masaru Kitsuregawa, Yasushi Ogawa: Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC). VLDB 1990: 210-221 BibTeX
[KTMo83]
Masaru Kitsuregawa, Hidehiko Tanaka, Tohru Moto-Oka: Application of Hash to Data Base Machine and Its Architecture. New Generation Comput. 1(1): 63-74(1983) BibTeX
[LY90]
M. Seetha Lakshmi, Philip S. Yu: Effectiveness of Parallel Joins. IEEE Trans. Knowl. Data Eng. 2(4): 410-424(1990) BibTeX
[Omi91]
Edward Omiecinski: Performance Analysis of a Load Balancing Hash-Join Algorithm for a Shared Memory Multiprocessor. VLDB 1991: 375-385 BibTeX
[OR89]
Frank Olken, Doron Rotem: Random Sampling from B+ Trees. VLDB 1989: 269-277 BibTeX
[ORX90]
Frank Olken, Doron Rotem, Ping Xu: Random Sampling from Hash Files. SIGMOD Conference 1990: 375-386 BibTeX
[SD89]
Donovan A. Schneider, David J. DeWitt: A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment. SIGMOD Conference 1989: 110-121 BibTeX
[SN92]
S. Seshadri, Jeffrey F. Naughton: Sampling Issues in Parallel Database Systems. EDBT 1992: 328-343 BibTeX
[Sto86]
Michael Stonebraker: The Case for Shared Nothing. IEEE Database Eng. Bull. 9(1): 4-9(1986) BibTeX
[WDJ91]
Christopher B. Walton, Alfred G. Dale, Roy M. Jenevein: A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins. VLDB 1991: 537-548 BibTeX
[WDYT90]
Joel L. Wolf, Daniel M. Dias, Philip S. Yu, John Turek: An Effective Algorithm for Parallelizing Hash Joins in the Presence of Data Skew. ICDE 1991: 200-209 BibTeX

Referenced by

  1. Seigo Muto, Masaru Kitsuregawa: A Dynamic Load Balancing Strategy for Parallel Datacube Computation. DOLAP 1999: 67-72
  2. Michael J. Carey, Donald Kossmann: Reducing the Braking Distance of an SQL Query Engine. VLDB 1998: 158-169
  3. Manish Mehta, David J. DeWitt: Data Placement in Shared-Nothing Parallel Database Systems. VLDB J. 6(1): 53-72(1997)
  4. Sakti Pramanik, Walid R. Tout: The NUMA with Clusters of Processors for Parallel Join. IEEE Trans. Knowl. Data Eng. 9(4): 653-660(1997)
  5. Daniel Barbará, William DuMouchel, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis E. Ioannidis, H. V. Jagadish, Theodore Johnson, Raymond T. Ng, Viswanath Poosala, Kenneth A. Ross, Kenneth C. Sevcik: The New Jersey Data Reduction Report. IEEE Data Eng. Bull. 20(4): 3-45(1997)
  6. Jignesh M. Patel, Jie-Bing Yu, Navin Kabra, Kristin Tufte, Biswadeep Nag, Josef Burger, Nancy E. Hall, Karthikeyan Ramasamy, Roger Lueder, Curt J. Ellmann, Jim Kupsch, Shelly Guo, David J. DeWitt, Jeffrey F. Naughton: Building a Scaleable Geo-Spatial DBMS: Technology, Implementation, and Evaluation. SIGMOD Conference 1997: 336-347
  7. Ming-Syan Chen, Philip S. Yu, Kun-Lung Wu: Optimization of Parallel Execution for Multi-Join Queries. IEEE Trans. Knowl. Data Eng. 8(3): 416-428(1996)
  8. Viswanath Poosala, Yannis E. Ioannidis: Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing. VLDB 1996: 448-459
  9. Luc Bouganim, Daniela Florescu, Patrick Valduriez: Dynamic Load Balancing in Hierarchical Parallel Database Systems. VLDB 1996: 436-447
  10. Jignesh M. Patel, David J. DeWitt: Partition Based Spatial-Merge Join. SIGMOD Conference 1996: 259-270
  11. Nadejda Biscondi, André Flory, Lionel Brunie: Parallel Databases: Structured Query Optimization. ADBIS 1996: 146-152
  12. Kien A. Hua, Chiang Lee, Chau M. Hua: Dynamic Load Balancing in Multicomputer Database Systems Using Partition Tuning. IEEE Trans. Knowl. Data Eng. 7(6): 968-983(1995)
  13. Erhard Rahm, Robert Marek: Dynamic Multi-Resource Load Balancing in Parallel Database Systems. VLDB 1995: 395-406
  14. Gerhard Weikum: Tutorial on Parallel Database Systems. ICDT 1995: 33-37
  15. Kien A. Hua, Wallapak Tavanapong, Honesty C. Young: A Performance Evaluation of Load Balancing Techniques for Join Operations on Multicomputer Database Systems. ICDE 1995: 44-51
  16. Peter J. Haas, Arun N. Swami: Sampling-Based Selectivity Estimation for Joins Using Augmented Frequent Value Statistics. ICDE 1995: 522-531
  17. Lilian Harada, Masaru Kitsuregawa: Dynamic Join Product Skew Handling for Hash-Joins in Shared-Nothing Database Systems. DASFAA 1995: 246-255
  18. Kent E. Seamons, Marianne Winslett: Physical Schemas for Large Multidimensional Arrays in Scientific Computing Applications. SSDBM 1994: 218-227
  19. Peter J. Haas, Jeffrey F. Naughton, Arun N. Swami: On the Relative Cost of Sampling for Join Selectivity Estimation. PODS 1994: 14-24
  20. X. Zhao, Roger G. Johnson, Nigel J. Martin: DBJ - A Dynamic Balancing Hash Join Algorithm in Multiprocessor Database Systems (Extented Abstract). EDBT 1994: 301-308
  21. Kien A. Hua, Yu-lung Lo, Honesty C. Young: Considering Data Skew Factor in Multi-Way Join Query Optimization for Parallel Execution. VLDB J. 2(3): 303-330(1993)
  22. Erhard Rahm, Robert Marek: Analysis of Dynamic Load Balancing Strategies for Parallel Shared Nothing Database Systems. VLDB 1993: 182-193
  23. Ambuj Shatdal, Jeffrey F. Naughton: Using Shared Virtual Memory for Parallel Join Processing. SIGMOD Conference 1993: 119-128
  24. Peter J. Haas, Jeffrey F. Naughton, S. Seshadri, Arun N. Swami: Fixed-Precision Estimation of Join Selectivity. PODS 1993: 190-201
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:45:50 2009