ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Selectivity Estimation and Query Optimization in Large Databases with Highly Skewed Distribution of Column Values.

Clifford A. Lynch: Selectivity Estimation and Query Optimization in Large Databases with Highly Skewed Distribution of Column Values. VLDB 1988: 240-251
@inproceedings{DBLP:conf/vldb/Lynch88,
  author    = {Clifford A. Lynch},
  editor    = {Fran\c{c}ois Bancilhon and
               David J. DeWitt},
  title     = {Selectivity Estimation and Query Optimization in Large Databases
               with Highly Skewed Distribution of Column Values},
  booktitle = {Fourteenth International Conference on Very Large Data Bases,
               August 29 - September 1, 1988, Los Angeles, California, USA,
               Proceedings},
  publisher = {Morgan Kaufmann},
  year      = {1988},
  isbn      = {0-934613-75-3},
  pages     = {240-251},
  ee        = {db/conf/vldb/Lynch88.html},
  crossref  = {DBLP:conf/vldb/88},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

When column values in a large database follow highly skewed distributions (such as Zipf distributions, typically found in textual databases), qnery optimizers in current relational systems often fail to choose optimal query plans even for simple single-relation queries. The major cause of these optimization failures is incorrect predicate selectivity estimation; the likelihood and cost of such errors are quantified. A scheme for adding userdefined selectivity estimators to a relational DBMS is proposed. The paper defines a series of new selectivity estimation methods that work well with highly skewed value distributions and then compares them to currently used methods such as uniform approximation and histograms. Empirical data from a large bibliographic database is used throughout the analyses in this paper.

Copyright © 1988 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 4, VLDB '75-'88" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

François Bancilhon, David J. DeWitt (Eds.): Fourteenth International Conference on Very Large Data Bases, August 29 - September 1, 1988, Los Angeles, California, USA, Proceedings. Morgan Kaufmann 1988, ISBN 0-934613-75-3
BibTeX

References

[Bendat & Piersol 1971]
...
[Christodoulakis 1981]
...
[DLA 1987]
...
[Fedorowicz 1981]
...
[Kahn 1967]
...
[Knuth 1968]
Donald E. Knuth: The Art of Computer Programming, Volume I: Fundamental Algorithms, 2nd Edition. Addison-Wesley 1973
BibTeX
[Knuth 1973]
Donald E. Knuth: The Art of Computer Programming, Volume III: Sorting and Searching. Addison-Wesley 1973, ISBN 0-201-03803-X
BibTeX
[Kooi 1980]
Robert Kooi: The Optimization of Queries in Relational Databases. Ph.D. thesis, Case Western Reserve University 1980
BibTeX
[Lynch & Stonebraker 1988]
Clifford A. Lynch, Michael Stonebraker: Extended User-Defined Indexing with Application to Textual Databases. VLDB 1988: 306-317 BibTeX
[Lynch 1987]
...
[Piatetsky-Shapiro & Connell 1984]
Gregory Piatetsky-Shapiro, Charles Connell: Accurate Estimation of the Number of Tuples Satisfying a Condition. SIGMOD Conference 1984: 256-276 BibTeX
[RTI 1986]
...
[Selinger et al. 1979]
Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, Thomas G. Price: Access Path Selection in a Relational Database Management System. SIGMOD Conference 1979: 23-34 BibTeX
[Stonebraker et al. 1976]
Michael Stonebraker, Eugene Wong, Peter Kreps, Gerald Held: The Design and Implementation of INGRES. ACM Trans. Database Syst. 1(3): 189-222(1976) BibTeX
[Stonebraker 1986]
Michael Stonebraker: Inclusion of New Types in Relational Data Base Systems. ICDE 1986: 262-269 BibTeX
[Stonebraker & Rowe 1985]
Michael Stonebraker, Lawrence A. Rowe: The Design of Postgres. SIGMOD Conference 1986: 340-355 BibTeX
[Zaniolo 1983]
Carlo Zaniolo: The Database Language GEM. SIGMOD Conference 1983: 207-218 BibTeX

Referenced by

  1. Chiang Lee, Zue-An Chang: Utilizing Page-Level Join Index for Optimization in Parallel Join Execution. IEEE Trans. Knowl. Data Eng. 7(6): 900-914(1995)
  2. Paolo Ciaccia, Dario Maio: Domains and Active Domains: What This Distinction Implies for the Estimation of Projection Sizes in Relational Databases. IEEE Trans. Knowl. Data Eng. 7(4): 641-655(1995)
  3. Joel L. Wolf, Daniel M. Dias, Philip S. Yu, John Turek: New Algorithms for Parallelizing Relational Database Joins in the Presence of Data Skew. IEEE Trans. Knowl. Data Eng. 6(6): 990-997(1994)
  4. Chung-Min Chen, Nick Roussopoulos: Adaptive Selectivity Estimation Using Query Feedback. SIGMOD Conference 1994: 161-172
  5. Arun N. Swami, K. Bernhard Schiefer: On the Estimation of Join Result Sizes. EDBT 1994: 287-300
  6. Wei Sun, Yibei Ling, Naphtali Rishe, Yi Deng: An Instant and Accurate Estimation Method for Joins and Selection in a Retrieval-Intensive Environment. SIGMOD Conference 1993: 79-88
  7. Allen Van Gelder: Multiple Join Size Estimation by Virtual Domains. PODS 1993: 180-189
  8. Alfons Kemper, Guido Moerkotte, Michael Steinbrunn: Optimizing Boolean Expressions in Object-Bases. VLDB 1992: 79-90
  9. Mauro Negri, Giuseppe Pelagatti: Distributive Join: A New Algorithm for Joining Relations. ACM Trans. Database Syst. 16(4): 655-669(1991)
  10. Christopher B. Walton, Alfred G. Dale, Roy M. Jenevein: A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins. VLDB 1991: 537-548
  11. Joel L. Wolf, Daniel M. Dias, Philip S. Yu, John Turek: An Effective Algorithm for Parallelizing Hash Joins in the Presence of Data Skew. ICDE 1991: 200-209
  12. M. Seetha Lakshmi, Philip S. Yu: Effectiveness of Parallel Joins. IEEE Trans. Knowl. Data Eng. 2(4): 410-424(1990)
  13. Himawan Gunadhi, Arie Segev: A Framework for Query Optimization in Temporal Databases. SSDBM 1990: 131-147
  14. Richard J. Lipton, Jeffrey F. Naughton, Donovan A. Schneider: Practical Selectivity Estimation through Adaptive Sampling. SIGMOD Conference 1990: 1-11
  15. Richard J. Lipton, Jeffrey F. Naughton: Query Size Estimation by Adaptive Sampling. PODS 1990: 40-46
  16. Meng Chang Chen, Lawrence McNamee, Norman S. Matloff: Selectivity Estimation Using Homogeneity Measurement. ICDE 1990: 304-310
  17. C. Mohan, Donald J. Haderle, Yun Wang, Josephine M. Cheng: Single Table Access Using Multiple Indexes: Optimization, Execution, and Concurrency Control Techniques. EDBT 1990: 29-43
  18. Richard J. Lipton, Jeffrey F. Naughton: Estimating the Size of Generalized Transitive Closures. VLDB 1989: 165-171
  19. Silvio Salza, Mario Terranova: Evaluating the Size of Queries on Relational Databases with non Uniform Distribution and Stochastic Dependence. SIGMOD Conference 1989: 8-14
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:45:38 2009