ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing.

Viswanath Poosala, Yannis E. Ioannidis: Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing. VLDB 1996: 448-459
@inproceedings{DBLP:conf/vldb/PoosalaI96,
  author    = {Viswanath Poosala and
               Yannis E. Ioannidis},
  editor    = {T. M. Vijayaraman and
               Alejandro P. Buchmann and
               C. Mohan and
               Nandlal L. Sarda},
  title     = {Estimation of Query-Result Distribution and its Application in
               Parallel-Join Load Balancing},
  booktitle = {VLDB'96, Proceedings of 22th International Conference on Very
               Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India},
  publisher = {Morgan Kaufmann},
  year      = {1996},
  isbn      = {1-55860-382-4},
  pages     = {448-459},
  ee        = {db/conf/vldb/PoosalaI96.html},
  crossref  = {DBLP:conf/vldb/96},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

Many commercial database systems use some form of statistics, typically histograms, to summarize the contents of relations and permit efficient estimation of required quantities. While there has been considerable work done on identifying good histograms for the estimation of query-result sizes, little attention has been paid to the estimation of the data distribution of the result, which is of importance in query optimization. In this paper, we prove that the optimal histogram for estimating the size of the result of a join operator is optimal for estimating its data distribution as well. We also study the effectiveness of these optimal histograms in the context of an important application that requires estimates for the data distribution of a query result: load-balancing for parallel Hybrid hash joins. We derive a cost formula to capture the effect of data skew in both the input and output relations on the load and use the optimal histograms to estimate this cost most accurately. We have developed and implemented a load balancing algorithm using these histograms on a simulator for the Gamma parallel database system. The experiments establish the superiority of this approach compared to earlier ones in handling all kinds and levels of skew while incurring negligible overhead.

Copyright © 1996 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

T. M. Vijayaraman, Alejandro P. Buchmann, C. Mohan, Nandlal L. Sarda (Eds.): VLDB'96, Proceedings of 22th International Conference on Very Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India. Morgan Kaufmann 1996, ISBN 1-55860-382-4
Contents BibTeX

Electronic Edition

References

[B+90]
Haran Boral, William Alexander, Larry Clay, George P. Copeland, Scott Danforth, Michael J. Franklin, Brian E. Hart, Marc G. Smith, Patrick Valduriez: Prototyping Bubba, A Highly Parallel Database System. IEEE Trans. Knowl. Data Eng. 2(1): 4-24(1990) BibTeX
[Bro94]
...
[Chr84]
Stavros Christodoulakis: Implications of Certain Assumptions in Database Performance Evaluation. ACM Trans. Database Syst. 9(2): 163-186(1984) BibTeX
[DGG+86]
David J. DeWitt, Robert H. Gerber, Goetz Graefe, Michael L. Heytens, Krishna B. Kumar, M. Muralikrishna: GAMMA - A High Performance Dataflow Database Machine. VLDB 1986: 228-237 BibTeX
[DGS+90]
David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider, Allan Bricker, Hui-I Hsiao, Rick Rasmussen: The Gamma Database Machine Project. IEEE Trans. Knowl. Data Eng. 2(1): 44-62(1990) BibTeX
[DNSS92]
David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider, S. Seshadri: Practical Skew Handling in Parallel Joins. VLDB 1992: 27-40 BibTeX
[H+90]
Laura M. Haas, Walter Chang, Guy M. Lohman, John McPherson, Paul F. Wilms, George Lapis, Bruce G. Lindsay, Hamid Pirahesh, Michael J. Carey, Eugene J. Shekita: Starburst Mid-Flight: As the Dust Clears. IEEE Trans. Knowl. Data Eng. 2(1): 143-160(1990) BibTeX
[HL91]
Kien A. Hua, Chiang Lee: Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning. VLDB 1991: 525-535 BibTeX
[HY95]
Kien A. Hua, Wallapak Tavanapong, Honesty C. Young: A Performance Evaluation of Load Balancing Techniques for Join Operations on Multicomputer Database Systems. ICDE 1995: 44-51 BibTeX
[IC91]
Yannis E. Ioannidis, Stavros Christodoulakis: On the Propagation of Errors in the Size of Join Results. SIGMOD Conference 1991: 268-277 BibTeX
[Ioa93]
Yannis E. Ioannidis: Universality of Serial Histograms. VLDB 1993: 256-267 BibTeX
[IP95a]
Yannis E. Ioannidis, Viswanath Poosala: Balancing Histogram Optimality and Practicality for Query Result Size Estimation. SIGMOD Conference 1995: 233-244 BibTeX
[IP95b]
Yannis E. Ioannidis, Viswanath Poosala: Histogram-Based Solutions to Diverse Database Estimation Problems. IEEE Data Eng. Bull. 18(3): 10-18(1995) BibTeX
[KO90]
Masaru Kitsuregawa, Yasushi Ogawa: Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC). VLDB 1990: 210-221 BibTeX
[LY88]
M. Seetha Lakshmi, Philip S. Yu: Effect of Skew on Join Performance in Parallel Architectures. DPDS 1988: 107-120 BibTeX
[MD88]
M. Muralikrishna, David J. DeWitt: Equi-Depth Histograms For Estimating Selectivity Factors For Multi-Dimensional Queries. SIGMOD Conference 1988: 28-36 BibTeX
[PI96]
...
[PIHS96]
Viswanath Poosala, Yannis E. Ioannidis, Peter J. Haas, Eugene J. Shekita: Improved Histograms for Selectivity Estimation of Range Predicates. SIGMOD Conference 1996: 294-305 BibTeX
[PSC84]
Gregory Piatetsky-Shapiro, Charles Connell: Accurate Estimation of the Number of Tuples Satisfying a Condition. SIGMOD Conference 1984: 256-276 BibTeX
[SD89]
Donovan A. Schneider, David J. DeWitt: A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment. SIGMOD Conference 1989: 110-121 BibTeX
[Sha86]
Leonard D. Shapiro: Join Processing in Database Systems with Large Main Memories. ACM Trans. Database Syst. 11(3): 239-264(1986) BibTeX
[SN93]
Ambuj Shatdal, Jeffrey F. Naughton: Using Shared Virtual Memory for Parallel Join Processing. SIGMOD Conference 1993: 119-128 BibTeX
[Sto86]
Michael Stonebraker: The Case for Shared Nothing. IEEE Database Eng. Bull. 9(1): 4-9(1986) BibTeX
[Vit85]
Jeffrey Scott Vitter: Random Sampling with a Reservoir. ACM Trans. Math. Softw. 11(1): 37-57(1985) BibTeX
[WDJ91]
Christopher B. Walton, Alfred G. Dale, Roy M. Jenevein: A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins. VLDB 1991: 537-548 BibTeX
[Zip49]
George Kingsley Zipf: Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology. Addison-Wesley 1949
BibTeX

Referenced by

  1. Jeffrey Scott Vitter, Min Wang: Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets. SIGMOD Conference 1999: 193-204
  2. Joseph M. Hellerstein: Optimization Techniques for Queries with Expensive Methods. ACM Trans. Database Syst. 23(2): 113-157(1998)
  3. Daniel Barbará, William DuMouchel, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis E. Ioannidis, H. V. Jagadish, Theodore Johnson, Raymond T. Ng, Viswanath Poosala, Kenneth A. Ross, Kenneth C. Sevcik: The New Jersey Data Reduction Report. IEEE Data Eng. Bull. 20(4): 3-45(1997)
  4. Phillip B. Gibbons, Yossi Matias, Viswanath Poosala: Fast Incremental Maintenance of Approximate Histograms. VLDB 1997: 466-475
  5. Minos N. Garofalakis, Yannis E. Ioannidis: Parallel Query Scheduling and Optimization with Time- and Space-Shared Resources. VLDB 1997: 296-305
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:46:12 2009