ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

Random Sampling from Hash Files.

Frank Olken, Doron Rotem, Ping Xu: Random Sampling from Hash Files. SIGMOD Conference 1990: 375-386
@inproceedings{DBLP:conf/sigmod/OlkenRX90,
  author    = {Frank Olken and
               Doron Rotem and
               Ping Xu},
  editor    = {Hector Garcia-Molina and
               H. V. Jagadish},
  title     = {Random Sampling from Hash Files},
  booktitle = {Proceedings of the 1990 ACM SIGMOD International Conference on
               Management of Data, Atlantic City, NJ, May 23-25, 1990},
  publisher = {ACM Press},
  year      = {1990},
  pages     = {375-386},
  ee        = {http://doi.acm.org/10.1145/93597.98746, db/conf/sigmod/OlkenRX90.html},
  crossref  = {DBLP:conf/sigmod/90},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

In this paper we discuss simple random sampling from hash files on secondary storage. We consider both iterative and batch sampling algorithms from both static and dynamic hashing methods. The static methods considered are open addressing hash files and hash files with separate overflow chains. The dynamic hashing methods considered are Linear Hash files [Lit80] and Extendible Hash files [FNPS79] We give the cost of sampling in terms of the cost of successfully searching a hash file and show how to exploit features of the dynamic hashing methods to improve sampling efficiency.

Copyright © 1990 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

Online Version (ACM WWW Account required): Full Text in PDF Format

CDROM Version: Load the CDROM "Volume 1 Issue 2, SIGMOD '75-'92" and ...

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Hector Garcia-Molina, H. V. Jagadish (Eds.): Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, Atlantic City, NJ, May 23-25, 1990. ACM Press 1990 BibTeX , SIGMOD Record 19(2), June 1990
Contents

Online Edition: ACM Digital Library


References

[Ark84]
...
[Car75]
Alfonso F. Cardenas: Analysis and Performance of Inverted Data Base Structures. Commun. ACM 18(5): 253-263(1975) BibTeX
[Coc77]
William G. Cochran: Sampling Techniques, 3rd Edition. John Wiley 1977, ISBN 0-471-16240-X
BibTeX
[Den80]
Dorothy E. Denning: Secure Statistical Databases with Random Sample Queries. ACM Trans. Database Syst. 5(3): 291-315(1980) BibTeX
[FNPS79]
Ronald Fagin, Jürg Nievergelt, Nicholas Pippenger, H. Raymond Strong: Extendible Hashing - A Fast Access Method for Dynamic Files. ACM Trans. Database Syst. 4(3): 315-344(1979) BibTeX
[HOT88]
Wen-Chi Hou, Gultekin Özsoyoglu, Baldeo K. Taneja: Statistical Estimators for Relational Algebra Expressions. PODS 1988: 276-287 BibTeX
[HOT89]
Wen-Chi Hou, Gultekin Özsoyoglu, Baldeo K. Taneja: Processing Aggregate Relational Queries with Hard Time Constraints. SIGMOD Conference 1989: 68-77 BibTeX
[JK77]
...
[Knu73]
Donald E. Knuth: The Art of Computer Programming, Volume III: Sorting and Searching. Addison-Wesley 1973, ISBN 0-201-03803-X
BibTeX
[Lar82]
Per-Åke Larson: Performance Analysis of Linear Hashing with Partial Expansions. ACM Trans. Database Syst. 7(4): 566-587(1982) BibTeX
[Lit80]
Witold Litwin: Linear Hashing: A New Tool for File and Table Addressing. VLDB 1980: 212-223 BibTeX
[LN89]
Richard J. Lipton, Jeffrey F. Naughton: Estimating the Size of Generalized Transitive Closures. VLDB 1989: 165-171 BibTeX
[LTA79]
...
[Lu87]
Hongjun Lu: New Strategies for Computing the Transitive Closure of a Database Relation. VLDB 1987: 267-274 BibTeX
[LWW84]
...
[Mon85]
...
[Mor80]
...
[OR86]
Frank Olken, Doron Rotem: Simple Random Sampling from Relational Databases. VLDB 1986: 160-169 BibTeX
[OR89]
Frank Olken, Doron Rotem: Random Sampling from B+ Trees. VLDB 1989: 269-277 BibTeX
[Vit85]
Jeffrey Scott Vitter: Random Sampling with a Reservoir. ACM Trans. Math. Softw. 11(1): 37-57(1985) BibTeX
[Wil84]
Dan E. Willard: Sampling Algorithms for Differential Batch Retrieval Problems (Extended Abstract). ICALP 1984: 514-526 BibTeX
[Xu89]
...

Referenced by

  1. Christopher R. Palmer, Christos Faloutsos: Density Biased Sampling: An Improved Method for Data Mining and Clustering. SIGMOD Conference 2000: 82-92
  2. Daniel Barbará, William DuMouchel, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis E. Ioannidis, H. V. Jagadish, Theodore Johnson, Raymond T. Ng, Viswanath Poosala, Kenneth A. Ross, Kenneth C. Sevcik: The New Jersey Data Reduction Report. IEEE Data Eng. Bull. 20(4): 3-45(1997)
  3. Peter J. Haas, Jeffrey F. Naughton, Arun N. Swami: On the Relative Cost of Sampling for Join Selectivity Estimation. PODS 1994: 14-24
  4. Wen-Chi Hou, Gultekin Özsoyoglu: Processing Time-Constrained Aggregate Queries in CASE-DB. ACM Trans. Database Syst. 18(2): 224-261(1993)
  5. Gennady Antoshenkov: Query Processing in DEC Rdb: Major Issues and Future Challenges. IEEE Data Eng. Bull. 16(4): 42-52(1993)
  6. Richard H. Wolniewicz, Goetz Graefe: Algebraic Optimization of Computations over Scientific Databases. VLDB 1993: 13-24
  7. David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider, S. Seshadri: Practical Skew Handling in Parallel Joins. VLDB 1992: 27-40
  8. Frank Olken, Doron Rotem: Maintenance of Materialized Views of Sampling Queries. ICDE 1992: 632-641
  9. David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider: An Evaluation of Non-Equijoin Algorithms. VLDB 1991: 443-452
  10. Jeffrey F. Naughton, S. Seshadri: On Estimating the Size of Projections. ICDT 1990: 499-513
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:40:03 2009