ACM SIGMOD Anthology VLDB dblp.uni-trier.de

A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins.

Christopher B. Walton, Alfred G. Dale, Roy M. Jenevein: A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins. VLDB 1991: 537-548
@inproceedings{DBLP:conf/vldb/WaltonDJ91,
  author    = {Christopher B. Walton and
               Alfred G. Dale and
               Roy M. Jenevein},
  editor    = {Guy M. Lohman and
               Am\'{\i}lcar Sernadas and
               Rafael Camps},
  title     = {A Taxonomy and Performance Model of Data Skew Effects in Parallel
               Joins},
  booktitle = {17th International Conference on Very Large Data Bases, September
               3-6, 1991, Barcelona, Catalonia, Spain, Proceedings},
  publisher = {Morgan Kaufmann},
  year      = {1991},
  isbn      = {1-55860-150-3},
  pages     = {537-548},
  ee        = {db/conf/vldb/WaltonDJ91.html},
  crossref  = {DBLP:conf/vldb/91},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

Recent work on parallel joins and data skew has concentrated on algorithm design without considering the causes and characteristics of data skew itself. Existing analytic models of skew do not contain enough information to fully describe data skew in parallel implementations. Because the assumptions made about the nature of skew vary between authors, it is almost impossible to make valid comparisons of parallel algorithms. In this paper, a taxonomy of skew effects is developed, and a new performance model is introduced. The model is used to compare the performance of two parallel join algorithms.

Copyright © 1991 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Guy M. Lohman, Amílcar Sernadas, Rafael Camps (Eds.): 17th International Conference on Very Large Data Bases, September 3-6, 1991, Barcelona, Catalonia, Spain, Proceedings. Morgan Kaufmann 1991, ISBN 1-55860-150-3
BibTeX

References

[Baru & Frieder 1989]
Chaitanya K. Baru, Ophir Frieder: Database Operations in a Cube-Connected Multicomputer System. IEEE Trans. Computers 38(6): 920-927(1989) BibTeX
[Baru et al. 1987]
Chaitanya K. Baru, Ophir Frieder, Dilip D. Kandlur, Mark E. Segal: Join on a Cube: Analysis, Simulation, and Implementation. IWDM 1987: 61-74 BibTeX
[Boral 1988]
Haran Boral: Parallelism and Data Management. JCDKB 1988: 362-373 BibTeX
[Christodoulakis 1983]
Stavros Christodoulakis: Estimating record selectivities. Inf. Syst. 8(2): 105-115(1983) BibTeX
[Copeland et al. 1988]
George P. Copeland, William Alexander, Ellen E. Boughter, Tom W. Keller: Data Placement In Bubba. SIGMOD Conference 1988: 99-108 BibTeX
[DeWitt 1986]
David J. DeWitt, Robert H. Gerber, Goetz Graefe, Michael L. Heytens, Krishna B. Kumar, M. Muralikrishna: GAMMA - A High Performance Dataflow Database Machine. VLDB 1986: 228-237 BibTeX
[DeWitt et al. 1988]
David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider: A Performance Analysis of the Gamma Database Machine. SIGMOD Conference 1988: 350-360 BibTeX
[DeWitt 1990]
David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider, Allan Bricker, Hui-I Hsiao, Rick Rasmussen: The Gamma Database Machine Project. IEEE Trans. Knowl. Data Eng. 2(1): 44-62(1990) BibTeX
[DeWitt et al. 1984]
David J. DeWitt, Randy H. Katz, Frank Olken, Leonard D. Shapiro, Michael Stonebraker, David A. Wood: Implementation Techniques for Main Memory Database Systems. SIGMOD Conference 1984: 1-8 BibTeX
[Frieder 1990]
Ophir Frieder: Multiprocessor Algorithms for Relational-Database Operators on Hypercube Systems. IEEE Computer 23(11): 13-28(1990) BibTeX
[Gerber 1986]
...
[Gerber & DeWitt 1987]
...
[Hu & Muntz 1989]
...
[Kitsuregawa et al. 1983]
...
[Lakshmi & Yu 1988]
M. Seetha Lakshmi, Philip S. Yu: Effect of Skew on Join Performance in Parallel Architectures. DPDS 1988: 107-120 BibTeX
[Lakshmi & Yu 1989]
M. Seetha Lakshmi, Philip S. Yu: Limiting Factors of Join Performance on Parallel Processors. ICDE 1989: 488-496 BibTeX
[Lynch 1988]
Clifford A. Lynch: Selectivity Estimation and Query Optimization in Large Databases with Highly Skewed Distribution of Column Values. VLDB 1988: 240-251 BibTeX
[Montgomery et al. 1983]
...
[Omiecinski & Liu 1989]
...
[Richardson et al. 1987]
James P. Richardson, Hongjun Lu, Krishna P. Mikkilineni: Design and Evaluation of Parallel Pipelined Join Algorithms. SIGMOD Conference 1987: 399-409 BibTeX
[Schneider 1990]
Donovan A. Schneider, David J. DeWitt: Tradeoffs in Processing Complex Join Queries via Hashing in Multiprocessor Database Machines. VLDB 1990: 469-480 BibTeX
[Schneider & DeWitt 1989]
Donovan A. Schneider, David J. DeWitt: A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment. SIGMOD Conference 1989: 110-121 BibTeX
[Stonebraker 1986]
Michael Stonebraker: The Case for Shared Nothing. IEEE Database Eng. Bull. 9(1): 4-9(1986) BibTeX
[Teradata 1983]
...
[Walton et al. 1990]
...
[Wolf et al. 1990]
Joel L. Wolf, Daniel M. Dias, Philip S. Yu, John Turek: An Effective Algorithm for Parallelizing Hash Joins in the Presence of Data Skew. ICDE 1991: 200-209 BibTeX
[Wolf et al. 1990]
Joel L. Wolf, Daniel M. Dias, Philip S. Yu: An Effective Algorithm for Parallelizing Sort Merge in the Presence of Data Skew. DPDS 1990: 103-115 BibTeX

Referenced by

  1. Haruo Yokota, Yasuhiko Kanemasa, Jun Miyazaki: Fat-Btree: An Update-Conscious Parallel Directory Structure. ICDE 1999: 448-457
  2. Jose Alvin G. Gendrano, Bruce C. Huang, Jim M. Rodrigue, Bongki Moon, Richard T. Snodgrass: Parallel Algorithms for Computing Temporal Aggregates. ICDE 1999: 418-427
  3. Seigo Muto, Masaru Kitsuregawa: A Dynamic Load Balancing Strategy for Parallel Datacube Computation. DOLAP 1999: 67-72
  4. Junping Sun, William I. Grosky: Dynamic Maintenance of Multidimensional Range Data Partitioning for Parallel Data Processing. DOLAP 1998: 72-79
  5. Manish Mehta, David J. DeWitt: Data Placement in Shared-Nothing Parallel Database Systems. VLDB J. 6(1): 53-72(1997)
  6. Ming-Syan Chen, Hui-I Hsiao, Philip S. Yu: On Applying Hash Filters to Improving the Execution of Multi-Join Queries. VLDB J. 6(2): 121-131(1997)
  7. Sakti Pramanik, Walid R. Tout: The NUMA with Clusters of Processors for Parallel Join. IEEE Trans. Knowl. Data Eng. 9(4): 653-660(1997)
  8. Jignesh M. Patel, Jie-Bing Yu, Navin Kabra, Kristin Tufte, Biswadeep Nag, Josef Burger, Nancy E. Hall, Karthikeyan Ramasamy, Roger Lueder, Curt J. Ellmann, Jim Kupsch, Shelly Guo, David J. DeWitt, Jeffrey F. Naughton: Building a Scaleable Geo-Spatial DBMS: Technology, Implementation, and Evaluation. SIGMOD Conference 1997: 336-347
  9. Evan P. Harris, Kotagiri Ramamohanarao: Join Algorithm Costs Revisited. VLDB J. 5(1): 64-84(1996)
  10. Viswanath Poosala, Yannis E. Ioannidis: Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing. VLDB 1996: 448-459
  11. Luc Bouganim, Daniela Florescu, Patrick Valduriez: Dynamic Load Balancing in Hierarchical Parallel Database Systems. VLDB 1996: 436-447
  12. Chiang Lee, Zue-An Chang: Utilizing Page-Level Join Index for Optimization in Parallel Join Execution. IEEE Trans. Knowl. Data Eng. 7(6): 900-914(1995)
  13. Kien A. Hua, Chiang Lee, Chau M. Hua: Dynamic Load Balancing in Multicomputer Database Systems Using Partition Tuning. IEEE Trans. Knowl. Data Eng. 7(6): 968-983(1995)
  14. Ming-Syan Chen, Ming-Ling Lo, Philip S. Yu, Honesty C. Young: Applying Segmented Right-Deep Trees to Pipelining Multiple Hash Joins. IEEE Trans. Knowl. Data Eng. 7(4): 656-668(1995)
  15. Erhard Rahm, Robert Marek: Dynamic Multi-Resource Load Balancing in Parallel Database Systems. VLDB 1995: 395-406
  16. Ambuj Shatdal, Jeffrey F. Naughton: Adaptive Parallel Aggregation Algorithms. SIGMOD Conference 1995: 104-114
  17. Kien A. Hua, Wallapak Tavanapong, Honesty C. Young: A Performance Evaluation of Load Balancing Techniques for Join Operations on Multicomputer Database Systems. ICDE 1995: 44-51
  18. Lilian Harada, Masaru Kitsuregawa: Dynamic Join Product Skew Handling for Hash-Joins in Shared-Nothing Database Systems. DASFAA 1995: 246-255
  19. Won S. Lee, Phillip C.-Y. Sheu: An Object-Oriented Query Evaluation Scheme for Logical Databases in Massively Parallel Environment. IEEE Trans. Knowl. Data Eng. 6(1): 181-187(1994)
  20. Radek Vingralek, Yuri Breitbart, Gerhard Weikum: Distributed File Organization with Scalable Cost/Performance. SIGMOD Conference 1994: 253-264
  21. X. Zhao, Roger G. Johnson, Nigel J. Martin: DBJ - A Dynamic Balancing Hash Join Algorithm in Multiprocessor Database Systems (Extented Abstract). EDBT 1994: 301-308
  22. Kien A. Hua, Yu-lung Lo, Honesty C. Young: Considering Data Skew Factor in Multi-Way Join Query Optimization for Parallel Execution. VLDB J. 2(3): 303-330(1993)
  23. Goetz Graefe: Query Evaluation Techniques for Large Databases. ACM Comput. Surv. 25(2): 73-170(1993)
  24. Erhard Rahm, Robert Marek: Analysis of Dynamic Load Balancing Strategies for Parallel Shared Nothing Database Systems. VLDB 1993: 182-193
  25. Ambuj Shatdal, Jeffrey F. Naughton: Using Shared Virtual Memory for Parallel Join Processing. SIGMOD Conference 1993: 119-128
  26. David J. DeWitt, Jim Gray: Parallel Database Systems: The Future of High Performance Database Systems. Commun. ACM 35(6): 85-98(1992)
  27. David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider, S. Seshadri: Practical Skew Handling in Parallel Joins. VLDB 1992: 27-40
  28. Hongjun Lu, Kian-Lee Tan: Dynamic and Load-balanced Task-Oriented Datbase Query Processing in Parallel Systems. EDBT 1992: 357-372
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:45:50 2009