ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning.

Kien A. Hua, Chiang Lee: Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning. VLDB 1991: 525-535
@inproceedings{DBLP:conf/vldb/HuaL91,
  author    = {Kien A. Hua and
               Chiang Lee},
  editor    = {Guy M. Lohman and
               Am\'{\i}lcar Sernadas and
               Rafael Camps},
  title     = {Handling Data Skew in Multiprocessor Database Computers Using
               Partition Tuning},
  booktitle = {17th International Conference on Very Large Data Bases, September
               3-6, 1991, Barcelona, Catalonia, Spain, Proceedings},
  publisher = {Morgan Kaufmann},
  year      = {1991},
  isbn      = {1-55860-150-3},
  pages     = {525-535},
  ee        = {db/conf/vldb/HuaL91.html},
  crossref  = {DBLP:conf/vldb/91},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

Shared nothing multiprocessor architecture is known to be more scalable to support very large databases. Compared to other join strategies, a hash-based join algorithm is particularly efficient and easily parallelized for this computation model. However, this hardware structure is very sensitive to the data skew problem. Unless the parallel hash join algorithm includes some load balancing mechanism,skew effect can deteriorate the system performance severely.

In this paper, we propose two skew avoidance techniques and one skew resolution method. In particular, three new parallel hash join algorithms are presented. We developed an analytical model to study the effectiveness of these algorithms. The performance study indicates that the proposed techniques offer substantial improvement over the conventional strategies in the presence of data skew. It is also interesting to observe that the skew avoidance techniques provide join strategies that are robust against data skew; whereas the skew resolution method offers an adaptive join strategy that outperforms the conventional algorithms for any skew condition.

Copyright © 1991 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Guy M. Lohman, Amílcar Sernadas, Rafael Camps (Eds.): 17th International Conference on Very Large Data Bases, September 3-6, 1991, Barcelona, Catalonia, Spain, Proceedings. Morgan Kaufmann 1991, ISBN 1-55860-150-3
BibTeX

References

[1]
...
[2]
Haran Boral, William Alexander, Larry Clay, George P. Copeland, Scott Danforth, Michael J. Franklin, Brian E. Hart, Marc G. Smith, Patrick Valduriez: Prototyping Bubba, A Highly Parallel Database System. IEEE Trans. Knowl. Data Eng. 2(1): 4-24(1990) BibTeX
[3]
David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider, Allan Bricker, Hui-I Hsiao, Rick Rasmussen: The Gamma Database Machine Project. IEEE Trans. Knowl. Data Eng. 2(1): 44-62(1990) BibTeX
[4]
Susanne Englert, Jim Gray, Terrye Kocher, Praful Shah: A Benchmark of NonStop SQL Release 2 Demonstrating Near-Linear Speedup and Scaleup on Large Databases. SIGMETRICS 1990: 245-246 BibTeX
[5]
...
[6]
...
[7]
Michael Stonebraker: The Case for Shared Nothing. IEEE Database Eng. Bull. 9(1): 4-9(1986) BibTeX
[8]
M. Seetha Lakshmi, Philip S. Yu: Effect of Skew on Join Performance in Parallel Architectures. DPDS 1988: 107-120 BibTeX
[9]
...
[10]
Donovan A. Schneider, David J. DeWitt: A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment. SIGMOD Conference 1989: 110-121 BibTeX
[11]
Masaru Kitsuregawa, Yasushi Ogawa: Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC). VLDB 1990: 210-221 BibTeX
[12]
...
[13]
Kien A. Hua, Chiang Lee: An Adaptive Data Placement Scheme for Parallel Database Computer Systems. VLDB 1990: 493-506 BibTeX
[14]
...
[15]
M. Seetha Lakshmi, Philip S. Yu: Limiting Factors of Join Performance on Parallel Processors. ICDE 1989: 488-496 BibTeX
[16]
...
[17]
...
[18]
...
[19]
...
[20]
David J. DeWitt, Randy H. Katz, Frank Olken, Leonard D. Shapiro, Michael Stonebraker, David A. Wood: Implementation Techniques for Main Memory Database Systems. SIGMOD Conference 1984: 1-8 BibTeX

Referenced by

  1. Haruo Yokota, Yasuhiko Kanemasa, Jun Miyazaki: Fat-Btree: An Update-Conscious Parallel Directory Structure. ICDE 1999: 448-457
  2. Seigo Muto, Masaru Kitsuregawa: A Dynamic Load Balancing Strategy for Parallel Datacube Computation. DOLAP 1999: 67-72
  3. Manish Mehta, David J. DeWitt: Data Placement in Shared-Nothing Parallel Database Systems. VLDB J. 6(1): 53-72(1997)
  4. Jignesh M. Patel, Jie-Bing Yu, Navin Kabra, Kristin Tufte, Biswadeep Nag, Josef Burger, Nancy E. Hall, Karthikeyan Ramasamy, Roger Lueder, Curt J. Ellmann, Jim Kupsch, Shelly Guo, David J. DeWitt, Jeffrey F. Naughton: Building a Scaleable Geo-Spatial DBMS: Technology, Implementation, and Evaluation. SIGMOD Conference 1997: 336-347
  5. Chiang Lee, Chi-Sheng Shih, Yaw-Huei Chen: A Graph-Theoretic Model for Optimizing Large Join Queries. DASFAA 1997: 87-96
  6. Byoung Mo Im, Myoung-Ho Kim, Jae Soo Yoo: MIN-Entropy: A New Signature File Declustering Algorithm for Intra-Query Parallelism. DASFAA 1997: 235-242
  7. Evan P. Harris, Kotagiri Ramamohanarao: Join Algorithm Costs Revisited. VLDB J. 5(1): 64-84(1996)
  8. Viswanath Poosala, Yannis E. Ioannidis: Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing. VLDB 1996: 448-459
  9. Nadejda Biscondi, André Flory, Lionel Brunie: Parallel Databases: Structured Query Optimization. ADBIS 1996: 146-152
  10. Chiang Lee, Zue-An Chang: Utilizing Page-Level Join Index for Optimization in Parallel Join Execution. IEEE Trans. Knowl. Data Eng. 7(6): 900-914(1995)
  11. Kien A. Hua, Chiang Lee, Chau M. Hua: Dynamic Load Balancing in Multicomputer Database Systems Using Partition Tuning. IEEE Trans. Knowl. Data Eng. 7(6): 968-983(1995)
  12. Ming-Syan Chen, Ming-Ling Lo, Philip S. Yu, Honesty C. Young: Applying Segmented Right-Deep Trees to Pipelining Multiple Hash Joins. IEEE Trans. Knowl. Data Eng. 7(4): 656-668(1995)
  13. Kien A. Hua, Wallapak Tavanapong, Honesty C. Young: A Performance Evaluation of Load Balancing Techniques for Join Operations on Multicomputer Database Systems. ICDE 1995: 44-51
  14. Lilian Harada, Masaru Kitsuregawa: Dynamic Join Product Skew Handling for Hash-Joins in Shared-Nothing Database Systems. DASFAA 1995: 246-255
  15. Jeong-Ki Kim, Jae-Woo Chang: A New Parallel Signature File Method for Efficient Information Retrieval. CIKM 1995: 66-73
  16. X. Zhao, Roger G. Johnson, Nigel J. Martin: DBJ - A Dynamic Balancing Hash Join Algorithm in Multiprocessor Database Systems (Extented Abstract). EDBT 1994: 301-308
  17. Kien A. Hua, Yu-lung Lo, Honesty C. Young: Considering Data Skew Factor in Multi-Way Join Query Optimization for Parallel Execution. VLDB J. 2(3): 303-330(1993)
  18. Goetz Graefe: Query Evaluation Techniques for Large Databases. ACM Comput. Surv. 25(2): 73-170(1993)
  19. Pavel Zezula, Paolo Ciaccia, Paolo Tiberio: Hamming Filters: A Dynamic Signature File Organization for Parallel Stores. VLDB 1993: 314-327
  20. Ambuj Shatdal, Jeffrey F. Naughton: Using Shared Virtual Memory for Parallel Join Processing. SIGMOD Conference 1993: 119-128
  21. Chiang Lee, Zue-An Chang: Workload Balance and Page Access Scheduling For Parallel Joins In Shared-Nothing Systems. ICDE 1993: 411-418
  22. David J. DeWitt, Jim Gray: Parallel Database Systems: The Future of High Performance Database Systems. Commun. ACM 35(6): 85-98(1992)
  23. David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider, S. Seshadri: Practical Skew Handling in Parallel Joins. VLDB 1992: 27-40
  24. Hongjun Lu, Kian-Lee Tan: Dynamic and Load-balanced Task-Oriented Datbase Query Processing in Parallel Systems. EDBT 1992: 357-372
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:45:50 2009