ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

Balancing Histogram Optimality and Practicality for Query Result Size Estimation.

Yannis E. Ioannidis, Viswanath Poosala: Balancing Histogram Optimality and Practicality for Query Result Size Estimation. SIGMOD Conference 1995: 233-244
@inproceedings{DBLP:conf/sigmod/IoannidisP95,
  author    = {Yannis E. Ioannidis and
               Viswanath Poosala},
  editor    = {Michael J. Carey and
               Donovan A. Schneider},
  title     = {Balancing Histogram Optimality and Practicality for Query Result
               Size Estimation},
  booktitle = {Proceedings of the 1995 ACM SIGMOD International Conference on
               Management of Data, San Jose, California, May 22-25, 1995},
  publisher = {ACM Press},
  year      = {1995},
  pages     = {233-244},
  ee        = {http://doi.acm.org/10.1145/223784.223841, db/conf/sigmod/sigmod95-18.html},
  crossref  = {DBLP:conf/sigmod/95},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

Many current database systems use histograms to approximate the frequency distribution of values in the attributes of relations and based on them estimate query result sizes and access plan costs. In choosing among the various histograms, one has to balance between two conflicting goals: optimality, so that generated estimates have the least error, and practicality, so that histograms can be constructed and maintained efficiently. In this paper, we present both theoretical and experimental results on several issues related to this trade-off. Our overall conclusion is that the most effective approach is to focus on the class of histograms that accurately maintain the frequencies of a few attribute values and assume the uniform distribution for the rest, and choose for each relation the histogram in that class that is optimal for a self-join query.

Copyright © 1995 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

Online Version (ACM WWW Account required): Full Text in PDF Format

CDROM Version: Load the CDROM "Volume 1 Issue 1, SIGMOD '93-'97" and ...

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Michael J. Carey, Donovan A. Schneider (Eds.): Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, San Jose, California, May 22-25, 1995. ACM Press 1995 BibTeX , SIGMOD Record 24(2), June 1995
Contents

Online Edition: ACM Digital Library

[Index Terms]
[Full Text in PDF Format, 1342 KB]

References

[1]
Chung-Min Chen, Nick Roussopoulos: Adaptive Selectivity Estimation Using Query Feedback. SIGMOD Conference 1994: 161-172 BibTeX
[2]
Stavros Christodoulakis: Estimating Block Transfers and Join Sizes. SIGMOD Conference 1983: 40-54 BibTeX
[3]
Stavros Christodoulakis: Implications of Certain Assumptions in Database Performance Evaluation. ACM Trans. Database Syst. 9(2): 163-186(1984) BibTeX
[4]
...
[5]
...
[6]
Christos Faloutsos, H. V. Jagadish: On B-Tree Indices for Skewed Distributions. VLDB 1992: 363-374 BibTeX
[7]
Peter J. Haas, Arun N. Swami: Sequential Sampling Procedures for Query Size Estimation. SIGMOD Conference 1992: 341-350 BibTeX
[8]
Peter J. Haas, Arun N. Swami: Sampling-Based Selectivity Estimation for Joins Using Augmented Frequent Value Statistics. ICDE 1995: 522-531 BibTeX
[9]
Yannis E. Ioannidis: Universality of Serial Histograms. VLDB 1993: 256-267 BibTeX
[10]
Yannis E. Ioannidis, Stavros Christodoulakis: On the Propagation of Errors in the Size of Join Results. SIGMOD Conference 1991: 268-277 BibTeX
[11]
...
[12]
...
[13]
Nabil Kamel, Roger King: A Model of Data Distribution Based on Texture Analysis. SIGMOD Conference 1985: 319-325 BibTeX
[14]
...
[15]
Richard J. Lipton, Jeffrey F. Naughton, Donovan A. Schneider: Practical Selectivity Estimation through Adaptive Sampling. SIGMOD Conference 1990: 1-11 BibTeX
[16]
Michael V. Mannino, Paicheng Chu, Thomas Sager: Statistical Profile Estimation in Database Systems. ACM Comput. Surv. 20(3): 191-221(1988) BibTeX
[17]
...
[18]
T. H. Merrett, Ekow J. Otoo: Distribution Models of Relations. VLDB 1979: 418-425 BibTeX
[19]
M. Muralikrishna, David J. DeWitt: Equi-Depth Histograms For Estimating Selectivity Factors For Multi-Dimensional Queries. SIGMOD Conference 1988: 28-36 BibTeX
[20]
B. Muthuswamy, Larry Kerschberg: A Detailed Database Statistics Model for Realtional Query Optimization. ACM Annual Conference - The range of computing: mid-80's perspective 1985: 439-448 BibTeX
[21]
Gregory Piatetsky-Shapiro, Charles Connell: Accurate Estimation of the Number of Tuples Satisfying a Condition. SIGMOD Conference 1984: 256-276 BibTeX
[22]
Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, Thomas G. Price: Access Path Selection in a Relational Database Management System. SIGMOD Conference 1979: 23-34 BibTeX
[23]
Yun Wang: Experience from a Real Life Query Optimizer. SIGMOD Conference 1992: 286 BibTeX
[24]
George Kingsley Zipf: Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology. Addison-Wesley 1949
BibTeX

Referenced by

  1. Francesco Buccafurri, Filippo Furfaro, Domenico Saccà: Estimating Range Queries Using Aggregate Data with Integrity Constraints: A Probabilistic Approach. ICDT 2001: 390-404
  2. Nick Koudas, S. Muthukrishnan, Divesh Srivastava: Optimal Histograms for Hierarchical Range Queries. PODS 2000: 196-204
  3. Viswanath Poosala, Venkatesh Ganti, Yannis E. Ioannidis: Approximate Query Answering using Histograms. IEEE Data Eng. Bull. 22(4): 5-14(1999)
  4. H. V. Jagadish, Nick Koudas, S. Muthukrishnan: Mining Deviants in a Time Series Database. VLDB 1999: 102-113
  5. Arnd Christian König, Gerhard Weikum: Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation. VLDB 1999: 423-434
  6. H. V. Jagadish, Olga Kapitskaia, Raymond T. Ng, Divesh Srivastava: Multi-Dimensional Substring Selectivity Estimation. VLDB 1999: 387-398
  7. Viswanath Poosala, Venkatesh Ganti: Fast Approximate Answers to Aggregate Queries on a Data Cube. SSDBM 1999: 24-33
  8. Ju-Hong Lee, Deok-Hwan Kim, Chin-Wan Chung: Multi-dimensional Selectivity Estimation Using Compressed Histogram Information. SIGMOD Conference 1999: 205-214
  9. H. V. Jagadish, Raymond T. Ng, Divesh Srivastava: Substring Selectivity Estimation. PODS 1999: 249-260
  10. Noga Alon, Phillip B. Gibbons, Yossi Matias, Mario Szegedy: Tracking Join and Self-Join Sizes in Limited Storage. PODS 1999: 10-20
  11. S. Muthukrishnan, Viswanath Poosala, Torsten Suel: On Rectangular Partitionings in Two Dimensions: Algorithms, Complexity, and Applications. ICDT 1999: 236-256
  12. Pedro Furtado, Henrique Madeira: Summary Grids: Building Accurate Multidimensional Histograms. DASFAA 1999: 187-194
  13. Joseph M. Hellerstein: Optimization Techniques for Queries with Expensive Methods. ACM Trans. Database Syst. 23(2): 113-157(1998)
  14. M. Seetha Lakshmi, Shaoyu Zhou: Selectivity Estimation in Extensible Databases - A Neural Network Approach. VLDB 1998: 623-627
  15. H. V. Jagadish, Nick Koudas, S. Muthukrishnan, Viswanath Poosala, Kenneth C. Sevcik, Torsten Suel: Optimal Histograms with Quality Guarantees. VLDB 1998: 275-286
  16. Navin Kabra, David J. DeWitt: Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans. SIGMOD Conference 1998: 106-117
  17. Phillip B. Gibbons, Yossi Matias: New Sampling-Based Summary Statistics for Improving Approximate Query Answers. SIGMOD Conference 1998: 331-342
  18. Surajit Chaudhuri, Rajeev Motwani, Vivek R. Narasayya: Random Sampling for Histogram Construction: How much is enough? SIGMOD Conference 1998: 436-447
  19. Hubert Naacke, Georges Gardarin, Anthony Tomasic: Leveraging Mediator Cost Models with Heterogeneous Data Sources. ICDE 1998: 351-360
  20. Daniel Barbará, William DuMouchel, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis E. Ioannidis, H. V. Jagadish, Theodore Johnson, Raymond T. Ng, Viswanath Poosala, Kenneth A. Ross, Kenneth C. Sevcik: The New Jersey Data Reduction Report. IEEE Data Eng. Bull. 20(4): 3-45(1997)
  21. Viswanath Poosala, Yannis E. Ioannidis: Selectivity Estimation Without the Attribute Value Independence Assumption. VLDB 1997: 486-495
  22. Christos Faloutsos, H. V. Jagadish, Nikolaos Sidiropoulos: Recovering Information from Summary Data. VLDB 1997: 36-45
  23. Yannis E. Ioannidis: Query Optimization. ACM Comput. Surv. 28(1): 121-123(1996)
  24. Viswanath Poosala, Yannis E. Ioannidis: Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing. VLDB 1996: 448-459
  25. Christos Faloutsos, Yossi Matias, Abraham Silberschatz: Modeling Skewed Distribution Using Multifractals and the `80-20' Law. VLDB 1996: 307-317
  26. Viswanath Poosala, Yannis E. Ioannidis, Peter J. Haas, Eugene J. Shekita: Improved Histograms for Selectivity Estimation of Range Predicates. SIGMOD Conference 1996: 294-305
  27. P. Krishnan, Jeffrey Scott Vitter, Balakrishna R. Iyer: Estimating Alphanumeric Selectivity in the Presence of Wildcards. SIGMOD Conference 1996: 282-293
  28. Yannis E. Ioannidis, Viswanath Poosala: Histogram-Based Solutions to Diverse Database Estimation Problems. IEEE Data Eng. Bull. 18(3): 10-18(1995)
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:40:26 2009