ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

Approximate Medians and other Quantiles in One Pass and with Limited Memory.

Gurmeet Singh Manku, Sridhar Rajagopalan, Bruce G. Lindsay: Approximate Medians and other Quantiles in One Pass and with Limited Memory. SIGMOD Conference 1998: 426-435
@inproceedings{DBLP:conf/sigmod/RajagopalanML98,
  author    = {Gurmeet Singh Manku and
               Sridhar Rajagopalan and
               Bruce G. Lindsay},
  editor    = {Laura M. Haas and
               Ashutosh Tiwary},
  title     = {Approximate Medians and other Quantiles in One Pass and with
               Limited Memory},
  booktitle = {SIGMOD 1998, Proceedings ACM SIGMOD International Conference
               on Management of Data, June 2-4, 1998, Seattle, Washington, USA},
  publisher = {ACM Press},
  year      = {1998},
  isbn      = {0-89791-995-5},
  pages     = {426-435},
  ee        = {http://doi.acm.org/10.1145/276304.276342, db/conf/sigmod/RajagopalanML98.html},
  crossref  = {DBLP:conf/sigmod/98},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

We present new algorithms for computing approximate quantiles of large datasets in a single pass. The approximation guarantees are explicit, and apply without regard to the value distribution or the arrival distribution of the dataset. The main memory requirements are smaller than those reported earlier by an order of magnitude.

We also discuss methods that couple the approximation algorithms with random sampling to further reduce memory requirements. With sampling, the approximation guarantees are explicit but probabilistic, i.e. they apply with respect to a (user controlled) confidence parameter.

We present the algorithms, their theoretical analysis and simulation results.

Copyright © 1998 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD DiSC

CDROM Version: Load the CDROM "DiSC, Volume 1 Number 1" and ... Online Version (ACM WWW Account required): Full Text in PDF Format

ACM SIGMOD Anthology

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Laura M. Haas, Ashutosh Tiwary (Eds.): SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2-4, 1998, Seattle, Washington, USA. ACM Press 1998, ISBN 0-89791-995-5 BibTeX , SIGMOD Record 27(2), June 1998
Contents

Online Edition: ACM SIGMOD

[Abstract]
[Full Text (Postscript)]

References

[1]
Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, Thomas G. Price: Access Path Selection in a Relational Database Management System. SIGMOD Conference 1979: 23-34 BibTeX
[2]
Gregory Piatetsky-Shapiro, Charles Connell: Accurate Estimation of the Number of Tuples Satisfying a Condition. SIGMOD Conference 1984: 256-276 BibTeX
[3]
Viswanath Poosala, Yannis E. Ioannidis, Peter J. Haas, Eugene J. Shekita: Improved Histograms for Selectivity Estimation of Range Predicates. SIGMOD Conference 1996: 294-305 BibTeX
[4]
...
[5]
...
[6]
David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider: Parallel Sorting on a Shared-Nothing Architecture using Probabilistic Splitting. PDIS 1991: 280-291 BibTeX
[7]
Manuel Blum, Robert W. Floyd, Vaughan R. Pratt, Ronald L. Rivest, Robert Endre Tarjan: Time Bounds for Selection. J. Comput. Syst. Sci. 7(4): 448-461(1973) BibTeX
[8]
...
[9]
...
[10]
...
[11]
...
[12]
...
[13]
...
[14]
...
[15]
J. Ian Munro, Mike Paterson: Selection and Sorting with Limited Storage. Theor. Comput. Sci. 12: 315-323(1980) BibTeX
[16]
Raj Jain, Imrich Chlamtac: The P² Algorithm for Dynamic Calculation of Quantiiles and Histograms Without Storing Observations. Commun. ACM 28(10): 1076-1085(1985) BibTeX
[17]
Rakesh Agrawal, Arun N. Swami: A One-Pass Space-Efficient Algorithm for Finding Quantiles. COMAD 1995: 0- BibTeX
[18]
Khaled Alsabti, Sanjay Ranka, Vineet Singh: A One-Pass Algorithm for Accurately Estimating Quantiles for Disk-Resident Data. VLDB 1997: 346-355 BibTeX
[19]
...

Referenced by

  1. Haixun Wang, Carlo Zaniolo: Using SQL to Build New Aggregates and Extenders for Object- Relational Systems. VLDB 2000: 166-175
  2. Yossi Matias, Jeffrey Scott Vitter, Min Wang: Dynamic Maintenance of Wavelet-Based Histograms. VLDB 2000: 101-110
  3. Aristides Gionis, Piotr Indyk, Rajeev Motwani: Similarity Search in High Dimensions via Hashing. VLDB 1999: 518-529
  4. Peter J. Haas: Techniques for Online Exploration of Large Object-Relational Datasets. SSDBM 1999: 4-12
  5. Gurmeet Singh Manku, Sridhar Rajagopalan, Bruce G. Lindsay: Random Sampling Techniques for Space Efficient Online Computation of Order Statistics of Large Datasets. SIGMOD Conference 1999: 251-262
  6. Ashraf Aboulnaga, Surajit Chaudhuri: Self-tuning Histograms: Building Histograms Without Looking at Data. SIGMOD Conference 1999: 181-192
  7. Noga Alon, Phillip B. Gibbons, Yossi Matias, Mario Szegedy: Tracking Join and Self-Join Sizes in Limited Storage. PODS 1999: 10-20
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:40:44 2009