ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Universality of Serial Histograms.

Yannis E. Ioannidis: Universality of Serial Histograms. VLDB 1993: 256-267
@inproceedings{DBLP:conf/vldb/Ioannidis93,
  author    = {Yannis E. Ioannidis},
  editor    = {Rakesh Agrawal and
               Se{\'a}n Baker and
               David A. Bell},
  title     = {Universality of Serial Histograms},
  booktitle = {19th International Conference on Very Large Data Bases, August
               24-27, 1993, Dublin, Ireland, Proceedings},
  publisher = {Morgan Kaufmann},
  year      = {1993},
  isbn      = {1-55860-152-X},
  pages     = {256-267},
  ee        = {db/conf/vldb/Ioannidis93.html},
  crossref  = {DBLP:conf/vldb/93},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

Many current relational database systems use some form of histograms to approximate the frequency distribution of values in the attributes of relations and based on them estimate query result sizes and access plan costs. The errors that exist in the histogram approximations directly or transitively affect many estimates derived by the database system. We identify the class of serial histograms and demonstrate that they are optimal for reducing the query result size error for several classes of queries when the actual query result size (and hence the value of that error) reaches some extreme. Specifically, serial histograms are shown to be optimal for arbitrary tree equality-join queries when the query result size is maximized, whether or not the attribute independence assumption holds, and when the query result size is minimized and the attribute independence assumption holds. We also show that the expected error for any such query is always zero under all histograms, and thus argue that histograms should be chosen based on the reduction of the extreme-cases error, since reduction of the expected error is meaningless.

Copyright © 1993 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Rakesh Agrawal, Seán Baker, David A. Bell (Eds.): 19th International Conference on Very Large Data Bases, August 24-27, 1993, Dublin, Ireland, Proceedings. Morgan Kaufmann 1993, ISBN 1-55860-152-X
Contents BibTeX

References

[Chr83]
Stavros Christodoulakis: Estimating Block Transfers and Join Sizes. SIGMOD Conference 1983: 40-54 BibTeX
[Chr84]
Stavros Christodoulakis: Implications of Certain Assumptions in Database Performance Evaluation. ACM Trans. Database Syst. 9(2): 163-186(1984) BibTeX
[IC92]
Yannis E. Ioannidis, Stavros Christodoulakis: Optimal Histograms for Limiting Worst-Case Error Propagation in the Size of Join Results. ACM Trans. Database Syst. 18(4): 709-748(1993) BibTeX
[Ioa93]
...
[KK85]
Nabil Kamel, Roger King: A Model of Data Distribution Based on Texture Analysis. SIGMOD Conference 1985: 319-325 BibTeX
[Koo80]
Robert Kooi: The Optimization of Queries in Relational Databases. Ph.D. thesis, Case Western Reserve University 1980
BibTeX
[MCS88]
Michael V. Mannino, Paicheng Chu, Thomas Sager: Statistical Profile Estimation in Database Systems. ACM Comput. Surv. 20(3): 191-221(1988) BibTeX
[MD88]
M. Muralikrishna, David J. DeWitt: Equi-Depth Histograms For Estimating Selectivity Factors For Multi-Dimensional Queries. SIGMOD Conference 1988: 28-36 BibTeX
[MK85]
B. Muthuswamy, Larry Kerschberg: A Detailed Database Statistics Model for Realtional Query Optimization. ACM Annual Conference - The range of computing: mid-80's perspective 1985: 439-448 BibTeX
[MO79a]
Albert W. Marshall, Ingram Olkin: Inequalities: Theory of Majorization and Its Application. Academic Press 1979, ISBN 0-12-473750-1
BibTeX
[MO79b]
T. H. Merrett, Ekow J. Otoo: Distribution Models of Relations. VLDB 1979: 418-425 BibTeX
[PSC84]
Gregory Piatetsky-Shapiro, Charles Connell: Accurate Estimation of the Number of Tuples Satisfying a Condition. SIGMOD Conference 1984: 256-276 BibTeX
[SAC+79]
Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, Thomas G. Price: Access Path Selection in a Relational Database Management System. SIGMOD Conference 1979: 23-34 BibTeX
[Sch64]
...
[Zip49]
George Kingsley Zipf: Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology. Addison-Wesley 1949
BibTeX

Referenced by

  1. Nick Koudas, S. Muthukrishnan, Divesh Srivastava: Optimal Histograms for Hierarchical Range Queries. PODS 2000: 196-204
  2. H. V. Jagadish, Nick Koudas, S. Muthukrishnan: Mining Deviants in a Time Series Database. VLDB 1999: 102-113
  3. Arnd Christian König, Gerhard Weikum: Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation. VLDB 1999: 423-434
  4. H. V. Jagadish, Olga Kapitskaia, Raymond T. Ng, Divesh Srivastava: Multi-Dimensional Substring Selectivity Estimation. VLDB 1999: 387-398
  5. Ju-Hong Lee, Deok-Hwan Kim, Chin-Wan Chung: Multi-dimensional Selectivity Estimation Using Compressed Histogram Information. SIGMOD Conference 1999: 205-214
  6. H. V. Jagadish, Raymond T. Ng, Divesh Srivastava: Substring Selectivity Estimation. PODS 1999: 249-260
  7. S. Muthukrishnan, Viswanath Poosala, Torsten Suel: On Rectangular Partitionings in Two Dimensions: Algorithms, Complexity, and Applications. ICDT 1999: 236-256
  8. Pedro Furtado, Henrique Madeira: Summary Grids: Building Accurate Multidimensional Histograms. DASFAA 1999: 187-194
  9. H. V. Jagadish, Nick Koudas, S. Muthukrishnan, Viswanath Poosala, Kenneth C. Sevcik, Torsten Suel: Optimal Histograms with Quality Guarantees. VLDB 1998: 275-286
  10. Phillip B. Gibbons, Yossi Matias: New Sampling-Based Summary Statistics for Improving Approximate Query Answers. SIGMOD Conference 1998: 331-342
  11. Surajit Chaudhuri: An Overview of Query Optimization in Relational Systems. PODS 1998: 34-43
  12. Daniel Barbará, William DuMouchel, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis E. Ioannidis, H. V. Jagadish, Theodore Johnson, Raymond T. Ng, Viswanath Poosala, Kenneth A. Ross, Kenneth C. Sevcik: The New Jersey Data Reduction Report. IEEE Data Eng. Bull. 20(4): 3-45(1997)
  13. Viswanath Poosala, Yannis E. Ioannidis: Selectivity Estimation Without the Attribute Value Independence Assumption. VLDB 1997: 486-495
  14. Viswanath Poosala, Yannis E. Ioannidis: Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing. VLDB 1996: 448-459
  15. Viswanath Poosala, Yannis E. Ioannidis, Peter J. Haas, Eugene J. Shekita: Improved Histograms for Selectivity Estimation of Range Predicates. SIGMOD Conference 1996: 294-305
  16. Nick Roussopoulos, Chung-Min Chen, Stephen Kelley, Alex Delis, Yannis Papakonstantinou: The ADMS Project: View R Us. IEEE Data Eng. Bull. 18(2): 19-28(1995)
  17. Yannis E. Ioannidis, Viswanath Poosala: Histogram-Based Solutions to Diverse Database Estimation Problems. IEEE Data Eng. Bull. 18(3): 10-18(1995)
  18. Yannis E. Ioannidis, Viswanath Poosala: Balancing Histogram Optimality and Practicality for Query Result Size Estimation. SIGMOD Conference 1995: 233-244
  19. Chung-Min Chen, Nick Roussopoulos: Adaptive Selectivity Estimation Using Query Feedback. SIGMOD Conference 1994: 161-172
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:45:56 2009