Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing.

Viswanath Poosala, Yannis E. Ioannidis: Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing. VLDB 1996: 448-459
  author    = {Viswanath Poosala and
               Yannis E. Ioannidis},
  editor    = {T. M. Vijayaraman and
               Alejandro P. Buchmann and
               C. Mohan and
               Nandlal L. Sarda},
  title     = {Estimation of Query-Result Distribution and its Application in
               Parallel-Join Load Balancing},
  booktitle = {VLDB'96, Proceedings of 22th International Conference on Very
               Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India},
  publisher = {Morgan Kaufmann},
  year      = {1996},
  isbn      = {1-55860-382-4},
  pages     = {448-459},
  ee        = {db/conf/vldb/PoosalaI96.html},
  crossref  = {DBLP:conf/vldb/96},
  bibsource = {DBLP,}


Many commercial database systems use some form of statistics, typically histograms, to summarize the contents of relations and permit efficient estimation of required quantities. While there has been considerable work done on identifying good histograms for the estimation of query-result sizes, little attention has been paid to the estimation of the data distribution of the result, which is of importance in query optimization. In this paper, we prove that the optimal histogram for estimating the size of the result of a join operator is optimal for estimating its data distribution as well. We also study the effectiveness of these optimal histograms in the context of an important application that requires estimates for the data distribution of a query result: load-balancing for parallel Hybrid hash joins. We derive a cost formula to capture the effect of data skew in both the input and output relations on the load and use the optimal histograms to estimate this cost most accurately. We have developed and implemented a load balancing algorithm using these histograms on a simulator for the Gamma parallel database system. The experiments establish the superiority of this approach compared to earlier ones in handling all kinds and levels of skew while incurring negligible overhead.

Copyright © 1996 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.

Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

T. M. Vijayaraman, Alejandro P. Buchmann, C. Mohan, Nandlal L. Sarda (Eds.): VLDB'96, Proceedings of 22th International Conference on Very Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India. Morgan Kaufmann 1996, ISBN 1-55860-382-4
Contents BibTeX

Electronic Edition


Haran Boral, William Alexander, Larry Clay, George P. Copeland, Scott Danforth, Michael J. Franklin, Brian E. Hart, Marc G. Smith, Patrick Valduriez: Prototyping Bubba, A Highly Parallel Database System. IEEE Trans. Knowl. Data Eng. 2(1): 4-24(1990) BibTeX
Stavros Christodoulakis: Implications of Certain Assumptions in Database Performance Evaluation. ACM Trans. Database Syst. 9(2): 163-186(1984) BibTeX
David J. DeWitt, Robert H. Gerber, Goetz Graefe, Michael L. Heytens, Krishna B. Kumar, M. Muralikrishna: GAMMA - A High Performance Dataflow Database Machine. VLDB 1986: 228-237 BibTeX
David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider, Allan Bricker, Hui-I Hsiao, Rick Rasmussen: The Gamma Database Machine Project. IEEE Trans. Knowl. Data Eng. 2(1): 44-62(1990) BibTeX
David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider, S. Seshadri: Practical Skew Handling in Parallel Joins. VLDB 1992: 27-40 BibTeX
Laura M. Haas, Walter Chang, Guy M. Lohman, John McPherson, Paul F. Wilms, George Lapis, Bruce G. Lindsay, Hamid Pirahesh, Michael J. Carey, Eugene J. Shekita: Starburst Mid-Flight: As the Dust Clears. IEEE Trans. Knowl. Data Eng. 2(1): 143-160(1990) BibTeX
Kien A. Hua, Chiang Lee: Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning. VLDB 1991: 525-535 BibTeX
Kien A. Hua, Wallapak Tavanapong, Honesty C. Young: A Performance Evaluation of Load Balancing Techniques for Join Operations on Multicomputer Database Systems. ICDE 1995: 44-51 BibTeX
Yannis E. Ioannidis, Stavros Christodoulakis: On the Propagation of Errors in the Size of Join Results. SIGMOD Conference 1991: 268-277 BibTeX
Yannis E. Ioannidis: Universality of Serial Histograms. VLDB 1993: 256-267 BibTeX
Yannis E. Ioannidis, Viswanath Poosala: Balancing Histogram Optimality and Practicality for Query Result Size Estimation. SIGMOD Conference 1995: 233-244 BibTeX
Yannis E. Ioannidis, Viswanath Poosala: Histogram-Based Solutions to Diverse Database Estimation Problems. IEEE Data Eng. Bull. 18(3): 10-18(1995) BibTeX
Masaru Kitsuregawa, Yasushi Ogawa: Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC). VLDB 1990: 210-221 BibTeX
M. Seetha Lakshmi, Philip S. Yu: Effect of Skew on Join Performance in Parallel Architectures. DPDS 1988: 107-120 BibTeX
M. Muralikrishna, David J. DeWitt: Equi-Depth Histograms For Estimating Selectivity Factors For Multi-Dimensional Queries. SIGMOD Conference 1988: 28-36 BibTeX
Viswanath Poosala, Yannis E. Ioannidis, Peter J. Haas, Eugene J. Shekita: Improved Histograms for Selectivity Estimation of Range Predicates. SIGMOD Conference 1996: 294-305 BibTeX
Gregory Piatetsky-Shapiro, Charles Connell: Accurate Estimation of the Number of Tuples Satisfying a Condition. SIGMOD Conference 1984: 256-276 BibTeX
Donovan A. Schneider, David J. DeWitt: A Performance Evaluation of Four Parallel Join Algorithms in a Shared-Nothing Multiprocessor Environment. SIGMOD Conference 1989: 110-121 BibTeX
Leonard D. Shapiro: Join Processing in Database Systems with Large Main Memories. ACM Trans. Database Syst. 11(3): 239-264(1986) BibTeX
Ambuj Shatdal, Jeffrey F. Naughton: Using Shared Virtual Memory for Parallel Join Processing. SIGMOD Conference 1993: 119-128 BibTeX
Michael Stonebraker: The Case for Shared Nothing. IEEE Database Eng. Bull. 9(1): 4-9(1986) BibTeX
Jeffrey Scott Vitter: Random Sampling with a Reservoir. ACM Trans. Math. Softw. 11(1): 37-57(1985) BibTeX
Christopher B. Walton, Alfred G. Dale, Roy M. Jenevein: A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins. VLDB 1991: 537-548 BibTeX
George Kingsley Zipf: Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology. Addison-Wesley 1949

Referenced by

  1. Jeffrey Scott Vitter, Min Wang: Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets. SIGMOD Conference 1999: 193-204
  2. Joseph M. Hellerstein: Optimization Techniques for Queries with Expensive Methods. ACM Trans. Database Syst. 23(2): 113-157(1998)
  3. Daniel Barbará, William DuMouchel, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis E. Ioannidis, H. V. Jagadish, Theodore Johnson, Raymond T. Ng, Viswanath Poosala, Kenneth A. Ross, Kenneth C. Sevcik: The New Jersey Data Reduction Report. IEEE Data Eng. Bull. 20(4): 3-45(1997)
  4. Phillip B. Gibbons, Yossi Matias, Viswanath Poosala: Fast Incremental Maintenance of Approximate Histograms. VLDB 1997: 466-475
  5. Minos N. Garofalakis, Yannis E. Ioannidis: Parallel Query Scheduling and Optimization with Time- and Space-Shared Resources. VLDB 1997: 296-305
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (, Corrections:
DBLP: Copyright © by Michael Ley (, last change: Sat May 16 23:46:12 2009