|




















|
|
 |
|
 |
On Random Sampling over Joins
|
Surajit Chaudhuri,
Rajeev Motwani, and
Vivek R. Narasayya
View Paper (PDF)
Return to Sampling - Approximate Answers
A major bottleneck in implementing sampling as a primitive relational operation is the inefficiency of sampling the output of a query. It is not even known whether it is possible to generate a sample of a join tree without first evaluating the join tree completely. We undertake a detailed study of this problem and attempt to analyze it in a variety of settings. We present theoretical results explaining the difficulty of this problem and setting limits on the efficiency that can be achieved. Based on new insights into the interaction between join and sampling, we develop join sampling techniques for the settings where our negative results do not apply. Our new sampling algorithms are signifficantly more efficient than those known earlier. We present experimental evaluation of our techniques on Microsoft's SQL Server 7.0.
Note: References link to DBLP on the Web.
-
[1]
-
Surajit Chaudhuri
,
Rajeev Motwani
,
Vivek R. Narasayya
: Random Sampling for Histogram Construction: How much is enough?
SIGMOD Conference 1998
: 436-447
-
[2]
-
Sumit Ganguly
,
Phillip B. Gibbons
,
Yossi Matias
,
Abraham Silberschatz
: Bifocal Sampling for Skew-Resistant Join Size Estimation.
SIGMOD Conf. 1996
: 271-281
-
[3]
-
Peter J. Haas
,
Jeffrey F. Naughton
,
Arun N. Swami
: On the Relative Cost of Sampling for Join Selectivity Estimation.
PODS 1994
: 14-24
-
[4]
-
Joseph M. Hellerstein
,
Peter J. Haas
,
Helen Wang
: Online Aggregation.
SIGMOD Conference 1997
: 171-182
-
[5]
-
Wen-Chi Hou
,
Gultekin Özsoyoglu
,
Erdogan Dogdu
: Error-Constraint COUNT Query Evaluation in Relational Databases.
SIGMOD Conference 1991
: 278-287
-
[6]
-
Richard J. Lipton
,
Jeffrey F. Naughton
,
Donovan A. Schneider
,
S. Seshadri
: Efficient Sampling Strategies for Relational Database Operations.
TCS 116(1&2)
: 195-226(1993)
-
[7]
-
Rajeev Motwani
,
Prabhakar Raghavan
: Randomized Algorithms. Cambridge University Press 1995, ISBN 0-521-47465-5
-
[8]
-
Jeffrey F. Naughton
,
S. Seshadri
: On Estimating the Size of Projections.
ICDT 1990
: 499-513
-
[9]
-
Frank Olken
,
Doron Rotem
: Simple Random Sampling from Relational Databases.
VLDB 1986
: 160-169
-
[10]
-
...
-
[11]
-
Gregory Piatetsky-Shapiro
,
Charles Connell
: Accurate Estimation of the Number of Tuples Satisfying a Condition.
SIGMOD Conference 1984
: 256-276
-
[12]
-
Jeffrey Scott Vitter
: Random Sampling with a Reservoir.
TOMS 11(1)
: 37-57(1985)
-
[13]
-
George Kingsley Zipf: Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology. Addison-Wesley 1949
@inproceedings{DBLP:conf/sigmod/ChaudhuriMN99,
author = {Surajit Chaudhuri and
Rajeev Motwani and
Vivek R. Narasayya},
editor = {Alex Delis and
Christos Faloutsos and
Shahram Ghandeharizadeh},
title = {On Random Sampling over Joins},
booktitle = {SIGMOD 1999, Proceedings ACM SIGMOD International Conference
on Management of Data, June 1-3, 1999, Philadephia, Pennsylvania,
USA},
publisher = {ACM Press},
year = {1999},
isbn = {1-58113-084-8},
pages = {263-274},
crossref = {DBLP:conf/sigmod/99},
bibsource = {DBLP, http://dblp.uni-trier.de} } },
Copyright(C) 2000 ACM
|
|
|
|
|
|
|