ACM SIGMOD Anthology TODS dblp.uni-trier.de

Optimal Histograms for Limiting Worst-Case Error Propagation in the Size of Join Results.

Yannis E. Ioannidis, Stavros Christodoulakis: Optimal Histograms for Limiting Worst-Case Error Propagation in the Size of Join Results. ACM Trans. Database Syst. 18(4): 709-748(1993)
@article{DBLP:journals/tods/IoannidisC93,
  author    = {Yannis E. Ioannidis and
               Stavros Christodoulakis},
  title     = {Optimal Histograms for Limiting Worst-Case Error Propagation
               in the Size of Join Results},
  journal   = {ACM Trans. Database Syst.},
  volume    = {18},
  number    = {4},
  year      = {1993},
  pages     = {709-748},
  ee        = {http://doi.acm.org/10.1145/169725.169708, db/journals/tods/IoannidisC93.html},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

Many current relational database systems use some form of histograms to approximate the frequency distribution of values in the attributes of relations and on this basis estimate query result sizes and access plan costs. The errors that exist in the histogram approximations directly or transitively affect many estimates derived by the database system. We identify the class of serial histograms and its subclass of end-biased histograms; the latter is of particular interest because such histograms are used in several database systems. We concentrate on equality join queries without function symbols where each relation is joined on the same attribute(s) for all joins in which it participates. Join queries of this restricted type are called t-clique queries. We show that the optimal histogram for reducing the worst-case error in the result size of such a query is always serial. For queries with one join and no function symbols (all of which are vacuously t-clique queries), we present results on finding the optimal serial histogram and the optimal end-biased histogram based on the query characteristics and the frequency distributions of values in the join attributes of the query relations. Finally, we prove that for t-clique queries with a very large number of joins, high-biased histograms (which form a subclass of end-biased histograms) are always optimal. To construct a histogram for the join attribute(s) of a relation, the values in the attribute(s) must first be sorted based on their frequency and then assigned into buckets according to the optimality results above.

Copyright © 1993 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


Joint ACM SIGMOD / IEEE Computer Society Anthology

CDROM Version: Load the CDROM "Volume 3 Issue 2, TODS 1991-1995, TKDE 1989-1992" and ... DVD Version: Load ACM SIGMOD Anthology DVD 2" and ... BibTeX

Online Edition: ACM Digital Library

[Index Terms]
[Full Text in PDF Format, 2381 KB]

References

[1]
Stavros Christodoulakis: Estimating Block Transfers and Join Sizes. SIGMOD Conference 1983: 40-54 BibTeX
[2]
Stavros Christodoulakis: Implications of Certain Assumptions in Database Performance Evaluation. ACM Trans. Database Syst. 9(2): 163-186(1984) BibTeX
[3]
...
[4]
Yannis E. Ioannidis, Stavros Christodoulakis: On the Propagation of Errors in the Size of Join Results. SIGMOD Conference 1991: 268-277 BibTeX
[5]
...
[6]
Nabil Kamel, Roger King: A Model of Data Distribution Based on Texture Analysis. SIGMOD Conference 1985: 319-325 BibTeX
[7]
Robert Kooi: The Optimization of Queries in Relational Databases. Ph.D. thesis, Case Western Reserve University 1980
BibTeX
[8]
Lothar F. Mackert, Guy M. Lohman: R* Optimizer Validation and Performance Evaluation for Distributed Queries. VLDB 1986: 149-159 BibTeX
[9]
Lothar F. Mackert, Guy M. Lohman: R* Optimizer Validation and Performance Evaluation for Local Queries. SIGMOD Conference 1986: 84-95 BibTeX
[10]
Michael V. Mannino, Paicheng Chu, Thomas Sager: Statistical Profile Estimation in Database Systems. ACM Comput. Surv. 20(3): 191-221(1988) BibTeX
[11]
Albert W. Marshall, Ingram Olkin: Inequalities: Theory of Majorization and Its Application. Academic Press 1979, ISBN 0-12-473750-1
BibTeX
[12]
T. H. Merrett, Ekow J. Otoo: Distribution Models of Relations. VLDB 1979: 418-425 BibTeX
[13]
M. Muralikrishna, David J. DeWitt: Equi-Depth Histograms For Estimating Selectivity Factors For Multi-Dimensional Queries. SIGMOD Conference 1988: 28-36 BibTeX
[14]
B. Muthuswamy, Larry Kerschberg: A Detailed Database Statistics Model for Realtional Query Optimization. ACM Annual Conference - The range of computing: mid-80's perspective 1985: 439-448 BibTeX
[15]
Gregory Piatetsky-Shapiro, Charles Connell: Accurate Estimation of the Number of Tuples Satisfying a Condition. SIGMOD Conference 1984: 256-276 BibTeX
[16]
Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, Thomas G. Price: Access Path Selection in a Relational Database Management System. SIGMOD Conference 1979: 23-34 BibTeX
[17]
...
[18]
George Kingsley Zipf: Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology. Addison-Wesley 1949
BibTeX

Referenced by

  1. Nick Koudas, S. Muthukrishnan, Divesh Srivastava: Optimal Histograms for Hierarchical Range Queries. PODS 2000: 196-204
  2. Arnd Christian König, Gerhard Weikum: Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation. VLDB 1999: 423-434
  3. Björn Blohsfeld, Dieter Korus, Bernhard Seeger: A Comparison of Selectivity Estimators for Range Queries on Metric Attributes. SIGMOD Conference 1999: 239-250
  4. Pedro Furtado, Henrique Madeira: Summary Grids: Building Accurate Multidimensional Histograms. DASFAA 1999: 187-194
  5. Phillip B. Gibbons, Yossi Matias: New Sampling-Based Summary Statistics for Improving Approximate Query Answers. SIGMOD Conference 1998: 331-342
  6. Daniel Barbará, William DuMouchel, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis E. Ioannidis, H. V. Jagadish, Theodore Johnson, Raymond T. Ng, Viswanath Poosala, Kenneth A. Ross, Kenneth C. Sevcik: The New Jersey Data Reduction Report. IEEE Data Eng. Bull. 20(4): 3-45(1997)
  7. Viswanath Poosala, Yannis E. Ioannidis: Selectivity Estimation Without the Attribute Value Independence Assumption. VLDB 1997: 486-495
  8. Daniela Florescu, Daphne Koller, Alon Y. Levy: Using Probabilistic Information in Data Integration. VLDB 1997: 216-225
  9. Christos Faloutsos, Yossi Matias, Abraham Silberschatz: Modeling Skewed Distribution Using Multifractals and the `80-20' Law. VLDB 1996: 307-317
  10. Viswanath Poosala, Yannis E. Ioannidis, Peter J. Haas, Eugene J. Shekita: Improved Histograms for Selectivity Estimation of Range Predicates. SIGMOD Conference 1996: 294-305
  11. Paolo Ciaccia, Dario Maio: Domains and Active Domains: What This Distinction Implies for the Estimation of Projection Sizes in Relational Databases. IEEE Trans. Knowl. Data Eng. 7(4): 641-655(1995)
  12. Yannis E. Ioannidis, Viswanath Poosala: Histogram-Based Solutions to Diverse Database Estimation Problems. IEEE Data Eng. Bull. 18(3): 10-18(1995)
  13. Alberto Belussi, Christos Faloutsos: Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension. VLDB 1995: 299-310
  14. Peter J. Haas, Arun N. Swami: Sampling-Based Selectivity Estimation for Joins Using Augmented Frequent Value Statistics. ICDE 1995: 522-531
  15. Yannis E. Ioannidis: Universality of Serial Histograms. VLDB 1993: 256-267
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
TODS, ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Tue Jun 24 18:39:15 2008