Random Sampling from B+ Trees.
Frank Olken, Doron Rotem:
Random Sampling from B+ Trees.
VLDB 1989: 269-277@inproceedings{DBLP:conf/vldb/OlkenR89,
author = {Frank Olken and
Doron Rotem},
editor = {Peter M. G. Apers and
Gio Wiederhold},
title = {Random Sampling from B+ Trees},
booktitle = {Proceedings of the Fifteenth International Conference on Very
Large Data Bases, August 22-25, 1989, Amsterdam, The Netherlands},
publisher = {Morgan Kaufmann},
year = {1989},
isbn = {1-55860-101-5},
pages = {269-277},
ee = {db/conf/vldb/OlkenR89.html},
crossref = {DBLP:conf/vldb/89},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX
Abstract
We consider the design and analysis of algorithms to retrieve simple random samples from databases.
Specifically, we examine simple random sampling from B+ treefiles.
Existing methods of sampling from B+ trees, require the use of auxiliary rank information in the nodes of the tree.
Such modified B+ tree files are called "ranked B+trees".
We compare sampling from ranked B+ tree files, with new acceptance/rejection (A/R) sampling methods which sample directly from standard B+ trees.
Our new A/R sampling algorithm can easily be retrofit to existing DBMSs, and does not require the overhead of maintaining rank information.
We consider both iterative and batch sampling methods.
Copyright © 1989 by the VLDB Endowment.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or
distributed for direct commercial advantage, the VLDB
copyright notice and the title of the publication and
its date appear, and notice is given that copying
is by the permission of the Very Large Data Base
Endowment. To copy otherwise, or to republish, requires
a fee and/or special permission from the Endowment.
Online Paper
CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ...
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
BibTeX
Printed Edition
Peter M. G. Apers, Gio Wiederhold (Eds.):
Proceedings of the Fifteenth International Conference on Very Large Data Bases, August 22-25, 1989, Amsterdam, The Netherlands.
Morgan Kaufmann 1989, ISBN 1-55860-101-5
BibTeX
References
- [Ark84]
- ...
- [BK75]
- ...
- [Coc77]
- William G. Cochran:
Sampling Techniques, 3rd Edition.
John Wiley 1977, ISBN 0-471-16240-X
BibTeX
- [EN82]
- Jarmo Ernvall, Olli Nevalainen:
An Algorithm for Unbiased Random Sampling.
Comput. J. 25(1): 45-47(1982) BibTeX
- [FMR62]
- ...
- [Gho86]
- Sakti P. Ghosh:
SIAM: statistics information access method.
Inf. Syst. 13(4): 359-368(1988) BibTeX
- [HOT88]
- Wen-Chi Hou, Gultekin Özsoyoglu, Baldeo K. Taneja:
Statistical Estimators for Relational Algebra Expressions.
PODS 1988: 276-287 BibTeX
- [Knu73]
- Donald E. Knuth:
The Art of Computer Programming, Volume III: Sorting and Searching.
Addison-Wesley 1973, ISBN 0-201-03803-X
BibTeX
- [LTA79]
- ...
- [LWW84]
- ...
- [Mon85]
- ...
- [Pal85]
- Prashant Palvia:
Expressions for Batched Searching of Sequential and Hierarchical Files.
ACM Trans. Database Syst. 10(1): 97-106(1985) BibTeX
- [SL88]
- Jaideep Srivastava, Vincent Y. Lum:
A Tree Based Access Method (TBSAM) for Fast Processing of Aggregate Queries.
ICDE 1988: 504-510 BibTeX
- [Vit84]
- Jeffrey Scott Vitter:
Faster Methods for Random Sampling.
Commun. ACM 27(7): 703-718(1984) BibTeX
- [Vit85]
- Jeffrey Scott Vitter:
Random Sampling with a Reservoir.
ACM Trans. Math. Softw. 11(1): 37-57(1985) BibTeX
- [WE80]
- C. K. Wong, Malcolm C. Easton:
An Efficient Method for Weighted Sampling Without Replacement.
SIAM J. Comput. 9(1): 111-113(1980) BibTeX
- [Yao77]
- S. Bing Yao:
Approximating the Number of Accesses in Database Organizations.
Commun. ACM 20(4): 260-261(1977) BibTeX
Referenced by
- Phillip B. Gibbons, Yossi Matias:
New Sampling-Based Summary Statistics for Improving Approximate Query Answers.
SIGMOD Conference 1998: 331-342
- Daniel Barbará, William DuMouchel, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis E. Ioannidis, H. V. Jagadish, Theodore Johnson, Raymond T. Ng, Viswanath Poosala, Kenneth A. Ross, Kenneth C. Sevcik:
The New Jersey Data Reduction Report.
IEEE Data Eng. Bull. 20(4): 3-45(1997)
- Gennady Antoshenkov, Mohamed Ziauddin:
Query Processing and Optimization in Oracle Rdb.
VLDB J. 5(4): 229-237(1996)
- Hannu Toivonen:
Sampling Large Databases for Association Rules.
VLDB 1996: 134-145
- Nabil I. Hachem, Chenye Bao, Steve Taylor:
Approximate Query Answering In Numerical Databases.
SSDBM 1996: 63-73
- Augustine C. Ikeji, Farshad Fotouhi:
Computation of Partial Query Results Using An Adaptive Stratified Sampling Technique.
CIKM 1995: 145-149
- Jason Tsong-Li Wang, Gung-Wei Chirn, Thomas G. Marr, Bruce A. Shapiro, Dennis Shasha, Kaizhong Zhang:
Combinatorial Pattern Discovery for Scientific Data: Some Preliminary Results.
SIGMOD Conference 1994: 115-125
- Peter J. Haas, Jeffrey F. Naughton, Arun N. Swami:
On the Relative Cost of Sampling for Join Selectivity Estimation.
PODS 1994: 14-24
- Wen-Chi Hou, Gultekin Özsoyoglu:
Processing Time-Constrained Aggregate Queries in CASE-DB.
ACM Trans. Database Syst. 18(2): 224-261(1993)
- Gennady Antoshenkov:
Query Processing in DEC Rdb: Major Issues and Future Challenges.
IEEE Data Eng. Bull. 16(4): 42-52(1993)
- Richard H. Wolniewicz, Goetz Graefe:
Algebraic Optimization of Computations over Scientific Databases.
VLDB 1993: 13-24
- Frank Olken, Doron Rotem:
Sampling from Spatial Databases.
ICDE 1993: 199-208
- Gennady Antoshenkov:
Dynamic Query Optimization in Rdb/VMS.
ICDE 1993: 538-547
- David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider, S. Seshadri:
Practical Skew Handling in Parallel Joins.
VLDB 1992: 27-40
- Gennady Antoshenkov:
Random Sampling from Pseudo-Ranked B+ Trees.
VLDB 1992: 375-382
- Peter J. Haas, Arun N. Swami:
Sequential Sampling Procedures for Query Size Estimation.
SIGMOD Conference 1992: 341-350
- Frank Olken, Doron Rotem:
Maintenance of Materialized Views of Sampling Queries.
ICDE 1992: 632-641
- David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider:
An Evaluation of Non-Equijoin Algorithms.
VLDB 1991: 443-452
- Frank Olken, Doron Rotem:
Random Sampling from Database Files: A Survey.
SSDBM 1990: 92-111
- Frank Olken, Doron Rotem, Ping Xu:
Random Sampling from Hash Files.
SIGMOD Conference 1990: 375-386
- Richard J. Lipton, Jeffrey F. Naughton, Donovan A. Schneider:
Practical Selectivity Estimation through Adaptive Sampling.
SIGMOD Conference 1990: 1-11
- Jeffrey F. Naughton, S. Seshadri:
On Estimating the Size of Projections.
ICDT 1990: 499-513
BibTeX
ACM SIGMOD Anthology - DBLP:
[Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:45:41 2009