Bifocal Sampling for Skew-Resistant Join Size Estimation.
Sumit Ganguly, Phillip B. Gibbons, Yossi Matias, Abraham Silberschatz:
Bifocal Sampling for Skew-Resistant Join Size Estimation.
SIGMOD Conference 1996: 271-281@inproceedings{DBLP:conf/sigmod/GangulyGMS96,
author = {Sumit Ganguly and
Phillip B. Gibbons and
Yossi Matias and
Abraham Silberschatz},
editor = {H. V. Jagadish and
Inderpal Singh Mumick},
title = {Bifocal Sampling for Skew-Resistant Join Size Estimation},
booktitle = {Proceedings of the 1996 ACM SIGMOD International Conference on
Management of Data, Montreal, Quebec, Canada, June 4-6, 1996},
publisher = {ACM Press},
year = {1996},
pages = {271-281},
ee = {http://doi.acm.org/10.1145/233269.233340, db/conf/sigmod/GangulyGMS96.html},
crossref = {DBLP:conf/sigmod/96},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX
Abstract
This paper introduces bifocal sampling, a new technique for
estimating the size of an equi-join of two relations.
Bifocal sampling classifies tuples in each relation into two groups,
sparse and dense, based on the number of tuples with the same join value.
Distinct estimation procedures are employed that focus on various
combinations for joining tuples (e.g., for estimating the number of
joining tuples that are dense in both relations).
This combination of estimation procedures overcomes some well-known
problems in previous schemes, enabling good estimates with no a priori
knowledge about the data distribution. The estimate obtained by the
bifocal sampling algorithm is proven to lie with high probability
within a small constant factor of the actual join size,
regardless of the skew, as long as the join size is
Omega(n lg n), for relations consisting of n tuples.
The algorithm requires a sample size at most
O(sqrt(n) lg n).
By contrast, previous algorithms using a sample of similar size may
require the join size to be Omega(n sqrt(n)) to guarantee an
accurate estimate. Experimental results support the theoretical
claims and show that bifocal sampling is practical and effective.
Copyright © 1996 by the ACM,
Inc., used by permission. Permission to make
digital or hard copies is granted provided that
copies are not made or distributed for profit or
direct commercial advantage, and that copies show
this notice on the first page or initial screen of
a display along with the full citation.
Online Version (ACM WWW Account required): Full Text in PDF Format
CDROM Version: Load the CDROM "Volume 1 Issue 1, SIGMOD '93-'97" and ...
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
BibTeX
Printed Edition
H. V. Jagadish, Inderpal Singh Mumick (Eds.):
Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 4-6, 1996.
ACM Press 1996 BibTeX
,
SIGMOD Record 25(2),
June 1996
Contents
[Index Terms]
[Full Text in PDF Format, 1184 KB]
References
- [HNS94]
- Peter J. Haas, Jeffrey F. Naughton, Arun N. Swami:
On the Relative Cost of Sampling for Join Selectivity Estimation.
PODS 1994: 14-24 BibTeX
- [HNSS93]
- Peter J. Haas, Jeffrey F. Naughton, S. Seshadri, Arun N. Swami:
Fixed-Precision Estimation of Join Selectivity.
PODS 1993: 190-201 BibTeX
- [HNSS95]
- Peter J. Haas, Jeffrey F. Naughton, S. Seshadri, Lynne Stokes:
Sampling-Based Estimation of the Number of Distinct Values of an Attribute.
VLDB 1995: 311-322 BibTeX
- [HÖD91]
- Wen-Chi Hou, Gultekin Özsoyoglu, Erdogan Dogdu:
Error-Constraint COUNT Query Evaluation in Relational Databases.
SIGMOD Conference 1991: 278-287 BibTeX
- [Hoe63]
- ...
- [HÖT88]
- Wen-Chi Hou, Gultekin Özsoyoglu, Baldeo K. Taneja:
Statistical Estimators for Relational Algebra Expressions.
PODS 1988: 276-287 BibTeX
- [HÖT89]
- Wen-Chi Hou, Gultekin Özsoyoglu, Baldeo K. Taneja:
Processing Aggregate Relational Queries with Hard Time Constraints.
SIGMOD Conference 1989: 68-77 BibTeX
- [HS92]
- Peter J. Haas, Arun N. Swami:
Sequential Sampling Procedures for Query Size Estimation.
SIGMOD Conference 1992: 341-350 BibTeX
- [LN89]
- Richard J. Lipton, Jeffrey F. Naughton:
Estimating the Size of Generalized Transitive Closures.
VLDB 1989: 165-171 BibTeX
- [LN90]
- Richard J. Lipton, Jeffrey F. Naughton:
Query Size Estimation by Adaptive Sampling.
PODS 1990: 40-46 BibTeX
- [LN95]
- Richard J. Lipton, Jeffrey F. Naughton:
Query Size Estimation by Adaptive Sampling.
J. Comput. Syst. Sci. 51(1): 18-25(1995) BibTeX
- [LNS90]
- Richard J. Lipton, Jeffrey F. Naughton, Donovan A. Schneider:
Practical Selectivity Estimation through Adaptive Sampling.
SIGMOD Conference 1990: 1-11 BibTeX
- [LNSS93]
- Richard J. Lipton, Jeffrey F. Naughton, Donovan A. Schneider, S. Seshadri:
Efficient Sampling Strategies for Relational Database Operations.
Theor. Comput. Sci. 116(1&2): 195-226(1993) BibTeX
- [LS92]
- Yibei Ling, Wei Sun:
A Suppletment to Sampling-Based Methods for Query Size Estimation in a Database System.
SIGMOD Record 21(4): 12-15(1992) BibTeX
Referenced by
- Surajit Chaudhuri, Rajeev Motwani:
On Sampling and Relational Operators.
IEEE Data Eng. Bull. 22(4): 41-46(1999)
- Arnd Christian König, Gerhard Weikum:
Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation.
VLDB 1999: 423-434
- Surajit Chaudhuri, Rajeev Motwani, Vivek R. Narasayya:
On Random Sampling over Joins.
SIGMOD Conference 1999: 263-274
- Swarup Acharya, Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy:
Join Synopses for Approximate Query Answering.
SIGMOD Conference 1999: 275-286
- Noga Alon, Phillip B. Gibbons, Yossi Matias, Mario Szegedy:
Tracking Join and Self-Join Sizes in Limited Storage.
PODS 1999: 10-20
- Daniel Barbará, William DuMouchel, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis E. Ioannidis, H. V. Jagadish, Theodore Johnson, Raymond T. Ng, Viswanath Poosala, Kenneth A. Ross, Kenneth C. Sevcik:
The New Jersey Data Reduction Report.
IEEE Data Eng. Bull. 20(4): 3-45(1997)
BibTeX
ACM SIGMOD Anthology - DBLP:
[Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:40:32 2009