ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

Bifocal Sampling for Skew-Resistant Join Size Estimation.

Sumit Ganguly, Phillip B. Gibbons, Yossi Matias, Abraham Silberschatz: Bifocal Sampling for Skew-Resistant Join Size Estimation. SIGMOD Conference 1996: 271-281
@inproceedings{DBLP:conf/sigmod/GangulyGMS96,
  author    = {Sumit Ganguly and
               Phillip B. Gibbons and
               Yossi Matias and
               Abraham Silberschatz},
  editor    = {H. V. Jagadish and
               Inderpal Singh Mumick},
  title     = {Bifocal Sampling for Skew-Resistant Join Size Estimation},
  booktitle = {Proceedings of the 1996 ACM SIGMOD International Conference on
               Management of Data, Montreal, Quebec, Canada, June 4-6, 1996},
  publisher = {ACM Press},
  year      = {1996},
  pages     = {271-281},
  ee        = {http://doi.acm.org/10.1145/233269.233340, db/conf/sigmod/GangulyGMS96.html},
  crossref  = {DBLP:conf/sigmod/96},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

This paper introduces bifocal sampling, a new technique for estimating the size of an equi-join of two relations. Bifocal sampling classifies tuples in each relation into two groups, sparse and dense, based on the number of tuples with the same join value. Distinct estimation procedures are employed that focus on various combinations for joining tuples (e.g., for estimating the number of joining tuples that are dense in both relations). This combination of estimation procedures overcomes some well-known problems in previous schemes, enabling good estimates with no a priori knowledge about the data distribution. The estimate obtained by the bifocal sampling algorithm is proven to lie with high probability within a small constant factor of the actual join size, regardless of the skew, as long as the join size is Omega(n lg n), for relations consisting of n tuples. The algorithm requires a sample size at most O(sqrt(n) lg n). By contrast, previous algorithms using a sample of similar size may require the join size to be Omega(n sqrt(n)) to guarantee an accurate estimate. Experimental results support the theoretical claims and show that bifocal sampling is practical and effective.

Copyright © 1996 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

Online Version (ACM WWW Account required): Full Text in PDF Format

CDROM Version: Load the CDROM "Volume 1 Issue 1, SIGMOD '93-'97" and ...

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

H. V. Jagadish, Inderpal Singh Mumick (Eds.): Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 4-6, 1996. ACM Press 1996 BibTeX , SIGMOD Record 25(2), June 1996
Contents

Online Edition: ACM Digital Library

[Index Terms]
[Full Text in PDF Format, 1184 KB]

References

[HNS94]
Peter J. Haas, Jeffrey F. Naughton, Arun N. Swami: On the Relative Cost of Sampling for Join Selectivity Estimation. PODS 1994: 14-24 BibTeX
[HNSS93]
Peter J. Haas, Jeffrey F. Naughton, S. Seshadri, Arun N. Swami: Fixed-Precision Estimation of Join Selectivity. PODS 1993: 190-201 BibTeX
[HNSS95]
Peter J. Haas, Jeffrey F. Naughton, S. Seshadri, Lynne Stokes: Sampling-Based Estimation of the Number of Distinct Values of an Attribute. VLDB 1995: 311-322 BibTeX
[HÖD91]
Wen-Chi Hou, Gultekin Özsoyoglu, Erdogan Dogdu: Error-Constraint COUNT Query Evaluation in Relational Databases. SIGMOD Conference 1991: 278-287 BibTeX
[Hoe63]
...
[HÖT88]
Wen-Chi Hou, Gultekin Özsoyoglu, Baldeo K. Taneja: Statistical Estimators for Relational Algebra Expressions. PODS 1988: 276-287 BibTeX
[HÖT89]
Wen-Chi Hou, Gultekin Özsoyoglu, Baldeo K. Taneja: Processing Aggregate Relational Queries with Hard Time Constraints. SIGMOD Conference 1989: 68-77 BibTeX
[HS92]
Peter J. Haas, Arun N. Swami: Sequential Sampling Procedures for Query Size Estimation. SIGMOD Conference 1992: 341-350 BibTeX
[LN89]
Richard J. Lipton, Jeffrey F. Naughton: Estimating the Size of Generalized Transitive Closures. VLDB 1989: 165-171 BibTeX
[LN90]
Richard J. Lipton, Jeffrey F. Naughton: Query Size Estimation by Adaptive Sampling. PODS 1990: 40-46 BibTeX
[LN95]
Richard J. Lipton, Jeffrey F. Naughton: Query Size Estimation by Adaptive Sampling. J. Comput. Syst. Sci. 51(1): 18-25(1995) BibTeX
[LNS90]
Richard J. Lipton, Jeffrey F. Naughton, Donovan A. Schneider: Practical Selectivity Estimation through Adaptive Sampling. SIGMOD Conference 1990: 1-11 BibTeX
[LNSS93]
Richard J. Lipton, Jeffrey F. Naughton, Donovan A. Schneider, S. Seshadri: Efficient Sampling Strategies for Relational Database Operations. Theor. Comput. Sci. 116(1&2): 195-226(1993) BibTeX
[LS92]
Yibei Ling, Wei Sun: A Suppletment to Sampling-Based Methods for Query Size Estimation in a Database System. SIGMOD Record 21(4): 12-15(1992) BibTeX

Referenced by

  1. Surajit Chaudhuri, Rajeev Motwani: On Sampling and Relational Operators. IEEE Data Eng. Bull. 22(4): 41-46(1999)
  2. Arnd Christian König, Gerhard Weikum: Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation. VLDB 1999: 423-434
  3. Surajit Chaudhuri, Rajeev Motwani, Vivek R. Narasayya: On Random Sampling over Joins. SIGMOD Conference 1999: 263-274
  4. Swarup Acharya, Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy: Join Synopses for Approximate Query Answering. SIGMOD Conference 1999: 275-286
  5. Noga Alon, Phillip B. Gibbons, Yossi Matias, Mario Szegedy: Tracking Join and Self-Join Sizes in Limited Storage. PODS 1999: 10-20
  6. Daniel Barbará, William DuMouchel, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis E. Ioannidis, H. V. Jagadish, Theodore Johnson, Raymond T. Ng, Viswanath Poosala, Kenneth A. Ross, Kenneth C. Sevcik: The New Jersey Data Reduction Report. IEEE Data Eng. Bull. 20(4): 3-45(1997)
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:40:32 2009