Digital Symposium Collection 2000  

 
 
 
 
 
 

 





















On Random Sampling over Joins

Surajit Chaudhuri, Rajeev Motwani, and Vivek R. Narasayya

  View Paper (PDF)  

Return to Sampling - Approximate Answers

Abstract
A major bottleneck in implementing sampling as a primitive relational operation is the inefficiency of sampling the output of a query. It is not even known whether it is possible to generate a sample of a join tree without first evaluating the join tree completely. We undertake a detailed study of this problem and attempt to analyze it in a variety of settings. We present theoretical results explaining the difficulty of this problem and setting limits on the efficiency that can be achieved. Based on new insights into the interaction between join and sampling, we develop join sampling techniques for the settings where our negative results do not apply. Our new sampling algorithms are signifficantly more efficient than those known earlier. We present experimental evaluation of our techniques on Microsoft's SQL Server 7.0.


References

Note: References link to DBLP on the Web.

[1]
Surajit Chaudhuri , Rajeev Motwani , Vivek R. Narasayya : Random Sampling for Histogram Construction: How much is enough? SIGMOD Conference 1998 : 436-447
[2]
Sumit Ganguly , Phillip B. Gibbons , Yossi Matias , Abraham Silberschatz : Bifocal Sampling for Skew-Resistant Join Size Estimation. SIGMOD Conf. 1996 : 271-281
[3]
Peter J. Haas , Jeffrey F. Naughton , Arun N. Swami : On the Relative Cost of Sampling for Join Selectivity Estimation. PODS 1994 : 14-24
[4]
Joseph M. Hellerstein , Peter J. Haas , Helen Wang : Online Aggregation. SIGMOD Conference 1997 : 171-182
[5]
Wen-Chi Hou , Gultekin Özsoyoglu , Erdogan Dogdu : Error-Constraint COUNT Query Evaluation in Relational Databases. SIGMOD Conference 1991 : 278-287
[6]
Richard J. Lipton , Jeffrey F. Naughton , Donovan A. Schneider , S. Seshadri : Efficient Sampling Strategies for Relational Database Operations. TCS 116(1&2) : 195-226(1993)
[7]
Rajeev Motwani , Prabhakar Raghavan : Randomized Algorithms. Cambridge University Press 1995, ISBN 0-521-47465-5
[8]
Jeffrey F. Naughton , S. Seshadri : On Estimating the Size of Projections. ICDT 1990 : 499-513
[9]
Frank Olken , Doron Rotem : Simple Random Sampling from Relational Databases. VLDB 1986 : 160-169
[10]
...
[11]
Gregory Piatetsky-Shapiro , Charles Connell : Accurate Estimation of the Number of Tuples Satisfying a Condition. SIGMOD Conference 1984 : 256-276
[12]
Jeffrey Scott Vitter : Random Sampling with a Reservoir. TOMS 11(1) : 37-57(1985)
[13]
George Kingsley Zipf: Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology. Addison-Wesley 1949

BIBTEX

@inproceedings{DBLP:conf/sigmod/ChaudhuriMN99,
  author    = {Surajit Chaudhuri and
                Rajeev Motwani and
                Vivek R. Narasayya},
   editor    = {Alex Delis and
                Christos Faloutsos and
                Shahram Ghandeharizadeh},
   title     = {On Random Sampling over Joins},
   booktitle = {SIGMOD 1999, Proceedings ACM SIGMOD International Conference
                on Management of Data, June 1-3, 1999, Philadephia, Pennsylvania,
                USA},
   publisher = {ACM Press},
   year      = {1999},
   isbn      = {1-58113-084-8},
   pages     = {263-274},
   crossref  = {DBLP:conf/sigmod/99},
   bibsource = {DBLP, http://dblp.uni-trier.de} } },


























Copyright(C) 2000 ACM