ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation.

Arnd Christian König, Gerhard Weikum: Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation. VLDB 1999: 423-434
@inproceedings{DBLP:conf/vldb/KonigW99,
  author    = {Arnd Christian K{\"o}nig and
               Gerhard Weikum},
  editor    = {Malcolm P. Atkinson and
               Maria E. Orlowska and
               Patrick Valduriez and
               Stanley B. Zdonik and
               Michael L. Brodie},
  title     = {Combining Histograms and Parametric Curve Fitting for Feedback-Driven
               Query Result-size Estimation},
  booktitle = {VLDB'99, Proceedings of 25th International Conference on Very
               Large Data Bases, September 7-10, 1999, Edinburgh, Scotland,
               UK},
  publisher = {Morgan Kaufmann},
  year      = {1999},
  isbn      = {1-55860-615-7},
  pages     = {423-434},
  ee        = {db/conf/vldb/KonigW99.html},
  crossref  = {DBLP:conf/vldb/99},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

This paper aims to improve the accuracy of query result-size estimations in query optimizers by leveraging the dynamic feedback obtained from observations on the executed query workload. To this end, an approximate "synopsis" of data-value distributions is devised that combines histograms with parametric curve fitting, leading to a specific class of linear splines. The approach reconciles the benefits of histograms, simplicity and versatility, with those of parametric techniques especially the adaptivity to statistically biased and dynamically evolving query workloads.

The paper presents efficient algorithms for constructing the linear-spline synopsis for data-value distributions from a moving window of the most recent observations on (the most critical) query executions. The approach is worked out in full detail for capturing frequency as well as density distributions of data values, and it is shown how result size estimations are inferred for exact-match and range queries as well as projections and grouping. To a large extent, the developed methods can be generalized to multi-dimensional distributions, thus bearing the ability to capture correlations among attributes as well. Experimental studies underline the accuracy of the developed estimation methods, outperforming the best known classes of histograms.

Copyright © 1999 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Malcolm P. Atkinson, Maria E. Orlowska, Patrick Valduriez, Stanley B. Zdonik, Michael L. Brodie (Eds.): VLDB'99, Proceedings of 25th International Conference on Very Large Data Bases, September 7-10, 1999, Edinburgh, Scotland, UK. Morgan Kaufmann 1999, ISBN 1-55860-615-7
Contents BibTeX

References

[1]
Daniel Barbará, William DuMouchel, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis E. Ioannidis, H. V. Jagadish, Theodore Johnson, Raymond T. Ng, Viswanath Poosala, Kenneth A. Ross, Kenneth C. Sevcik: The New Jersey Data Reduction Report. IEEE Data Eng. Bull. 20(4): 3-45(1997) BibTeX
[2]
Surajit Chaudhuri: An Overview of Query Optimization in Relational Systems. PODS 1998: 34-43 BibTeX
[3]
Surajit Chaudhuri, Rajeev Motwani, Vivek R. Narasayya: Random Sampling for Histogram Construction: How much is enough? SIGMOD Conference 1998: 436-447 BibTeX
[4]
Chung-Min Chen, Nick Roussopoulos: Adaptive Selectivity Estimation Using Query Feedback. SIGMOD Conference 1994: 161-172 BibTeX
[5]
...
[6]
...
[7]
Sumit Ganguly, Phillip B. Gibbons, Yossi Matias, Abraham Silberschatz: Bifocal Sampling for Skew-Resistant Join Size Estimation. SIGMOD Conference 1996: 271-281 BibTeX
[8]
Phillip B. Gibbons, Yossi Matias: New Sampling-Based Summary Statistics for Improving Approximate Query Answers. SIGMOD Conference 1998: 331-342 BibTeX
[9]
...
[10]
Phillip B. Gibbons, Yossi Matias, Viswanath Poosala: Fast Incremental Maintenance of Approximate Histograms. VLDB 1997: 466-475 BibTeX
[11]
...
[12]
Peter J. Haas, Jeffrey F. Naughton, S. Seshadri, Arun N. Swami: Selectivity and Cost Estimation for Joins Based on Random Sampling. J. Comput. Syst. Sci. 52(3): 550-569(1996) BibTeX
[13]
Yannis E. Ioannidis: Universality of Serial Histograms. VLDB 1993: 256-267 BibTeX
[14]
Yannis E. Ioannidis, Stavros Christodoulakis: On the Propagation of Errors in the Size of Join Results. SIGMOD Conference 1991: 268-277 BibTeX
[15]
Yannis E. Ioannidis, Stavros Christodoulakis: Optimal Histograms for Limiting Worst-Case Error Propagation in the Size of Join Results. ACM Trans. Database Syst. 18(4): 709-748(1993) BibTeX
[16]
Yannis E. Ioannidis, Viswanath Poosala: Balancing Histogram Optimality and Practicality for Query Result Size Estimation. SIGMOD Conference 1995: 233-244 BibTeX
[17]
H. V. Jagadish, Nick Koudas, S. Muthukrishnan, Viswanath Poosala, Kenneth C. Sevcik, Torsten Suel: Optimal Histograms with Quality Guarantees. VLDB 1998: 275-286 BibTeX
[18]
Navin Kabra, David J. DeWitt: Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans. SIGMOD Conference 1998: 106-117 BibTeX
[19]
Yibei Ling, Wei Sun: An Evaluation of Sampling-Based Size Estimation Methods for Selections in Database Systems. ICDE 1995: 532-539 BibTeX
[20]
Michael V. Mannino, Paicheng Chu, Thomas Sager: Statistical Profile Estimation in Database Systems. ACM Comput. Surv. 20(3): 191-221(1988) BibTeX
[21]
Yossi Matias, Jeffrey Scott Vitter, Min Wang: Wavelet-Based Histograms for Selectivity Estimation. SIGMOD Conference 1998: 448-459 BibTeX
[22]
...
[23]
...
[24]
Viswanath Poosala: Histogram-Based Estimation Techniques in Database Systems. Ph.D. thesis, Univ. of Wisconsin-Madison 1997
BibTeX
[25]
Viswanath Poosala, Yannis E. Ioannidis, Peter J. Haas, Eugene J. Shekita: Improved Histograms for Selectivity Estimation of Range Predicates. SIGMOD Conference 1996: 294-305 BibTeX
[26]
...
[27]
...
[28]
Wei Sun, Yibei Ling, Naphtali Rishe, Yi Deng: An Instant and Accurate Estimation Method for Joins and Selection in a Retrieval-Intensive Environment. SIGMOD Conference 1993: 79-88 BibTeX
[29]
...

Referenced by

  1. Dimitrios Gunopulos, George Kollios, Vassilis J. Tsotras, Carlotta Domeniconi: Approximating Multi-Dimensional Aggregate Range Queries over Real Attributes. SIGMOD Conference 2000: 463-474
  2. Christos Faloutsos, Bernhard Seeger, Agma J. M. Traina, Caetano Traina Jr.: Spatial Join Selectivity Using Power Laws. SIGMOD Conference 2000: 177-188
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:46:28 2009