New Sampling-Based Summary Statistics for Improving Approximate Query Answers.

Phillip B. Gibbons, Yossi Matias: New Sampling-Based Summary Statistics for Improving Approximate Query Answers. SIGMOD Conference 1998: 331-342

@inproceedings{DBLP:conf/sigmod/GibbonsM98,
  author    = {Phillip B. Gibbons and
               Yossi Matias},
  editor    = {Laura M. Haas and
               Ashutosh Tiwary},
  title     = {New Sampling-Based Summary Statistics for Improving Approximate
               Query Answers},
  booktitle = {SIGMOD 1998, Proceedings ACM SIGMOD International Conference
               on Management of Data, June 2-4, 1998, Seattle, Washington, USA},
  publisher = {ACM Press},
  year      = {1998},
  isbn      = {0-89791-995-5},
  pages     = {331-342},
  ee        = {http://doi.acm.org/10.1145/276304.276334, db/conf/sigmod/GibbonsM98.html},
  crossref  = {DBLP:conf/sigmod/98},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

BibTeX

Abstract

In large data recording and warehousing environments, it is often advantageous to provide fast, approximate answers to queries, whenever possible. Before DBMSs providing highly-accurate approximate answers can become a reality, many new techniques for summarizing data and for estimating answers from summarized data must be developed. This paper introduces two new sampling-based summary statistics, concise samples and counting samples, and presents new techniques for their fast incremental maintenance regardless of the data distribution. We quantify their advantages over standard sample views in terms of the number of additional sample points for the same view size, and hence in providing more accurate query answers. Finally, we consider their application to providing fast approximate answers to hot list queries. Our algorithms maintain their accuracy in the presence of ongoing insertions to the data warehouse.

Copyright © 1998 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.

ACM SIGMOD DiSC

CDROM Version: Load the CDROM "DiSC, Volume 1 Number 1" and ...

Windows: Click the letter of your CD drive
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Mac: Click here
UNIX/LINUX: mount the CD and click on the path of your mount point:
/Anthology/Disc99_1 or /cdrom

Online Version (ACM WWW Account required): Full Text in PDF Format

ACM SIGMOD Anthology

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...

Windows: Click the letter of your CD drive
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Mac: Click here
UNIX/LINUX: mount the DVD and click on the path of your mount point:
/Anthology/aDVD1 or /dvd

BibTeX

Printed Edition

Laura M. Haas, Ashutosh Tiwary (Eds.): SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2-4, 1998, Seattle, Washington, USA. ACM Press 1998, ISBN 0-89791-995-5 BibTeX , SIGMOD Record 27(2), June 1998
Contents

Online Edition: ACM SIGMOD

[Abstract]
[Full Text (Postscript)]

References

[AMS96]: ...
[Ant92]: Gennady Antoshenkov: Random Sampling from Pseudo-Ranked B+ Trees. VLDB 1992: 375-382 BibTeX
[AS94]: Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499 BibTeX
[AZ96]: Gennady Antoshenkov, Mohamed Ziauddin: Query Processing and Optimization in Oracle Rdb. VLDB J. 5(4): 229-237(1996) BibTeX
[BDF+97]: Daniel Barbará, William DuMouchel, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis E. Ioannidis, H. V. Jagadish, Theodore Johnson, Raymond T. Ng, Viswanath Poosala, Kenneth A. Ross, Kenneth C. Sevcik: The New Jersey Data Reduction Report. IEEE Data Eng. Bull. 20(4): 3-45(1997) BibTeX
[BM96]: Roberto J. Bayardo Jr., Daniel P. Miranker: Processing Queries for First Few Answers. CIKM 1996: 45-52 BibTeX
[BMUT97]: Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, Shalom Tsur: Dynamic Itemset Counting and Implication Rules for Market Basket Data. SIGMOD Conference 1997: 255-264 BibTeX
[FJS97]: Christos Faloutsos, H. V. Jagadish, Nikolaos Sidiropoulos: Recovering Information from Summary Data. VLDB 1997: 36-45 BibTeX
[Fla85]: Philippe Flajolet: Approximate Counting: A Detailed Analysis. BIT 25(1): 113-134(1985) BibTeX
[FM83]: Philippe Flajolet, G. Nigel Martin: Probabilistic Counting. FOCS 1983: 76-82 BibTeX
[FM85]: Philippe Flajolet, G. Nigel Martin: Probabilistic Counting Algorithms for Data Base Applications. J. Comput. Syst. Sci. 31(2): 182-209(1985) BibTeX
[GM95]: ...
[GM97]: ...
[GMP97a]: ...
[GMP97b]: Phillip B. Gibbons, Yossi Matias, Viswanath Poosala: Fast Incremental Maintenance of Approximate Histograms. VLDB 1997: 466-475 BibTeX
[GPA+98]: ...
[HHW97]: Joseph M. Hellerstein, Peter J. Haas, Helen J. Wang: Online Aggregation. SIGMOD Conference 1997: 171-182 BibTeX
[HK95]: ...
[HNSS95]: Peter J. Haas, Jeffrey F. Naughton, S. Seshadri, Lynne Stokes: Sampling-Based Estimation of the Number of Distinct Values of an Attribute. VLDB 1995: 311-322 BibTeX
[IC93]: Yannis E. Ioannidis, Stavros Christodoulakis: Optimal Histograms for Limiting Worst-Case Error Propagation in the Size of Join Results. ACM Trans. Database Syst. 18(4): 709-748(1993) BibTeX
[Ioa93]: Yannis E. Ioannidis: Universality of Serial Histograms. VLDB 1993: 256-267 BibTeX
[IP95]: Yannis E. Ioannidis, Viswanath Poosala: Balancing Histogram Optimality and Practicality for Query Result Size Estimation. SIGMOD Conference 1995: 233-244 BibTeX
[Mat92]: ...
[Mor78]: Robert Morris: Counting Large Numbers of Events in Small Registers. Commun. ACM 21(10): 840-842(1978) BibTeX
[MSY96]: ...
[MVN93]: ...
[MVY94]: ...
[OR89]: Frank Olken, Doron Rotem: Random Sampling from B+ Trees. VLDB 1989: 269-277 BibTeX
[OR92]: Frank Olken, Doron Rotem: Maintenance of Materialized Views of Sampling Queries. ICDE 1992: 632-641 BibTeX
[PIHS96]: Viswanath Poosala, Yannis E. Ioannidis, Peter J. Haas, Eugene J. Shekita: Improved Histograms for Selectivity Estimation of Range Predicates. SIGMOD Conference 1996: 294-305 BibTeX
[Pre97]: ...
[Vit85]: Jeffrey Scott Vitter: Random Sampling with a Reservoir. ACM Trans. Math. Softw. 11(1): 37-57(1985) BibTeX
[VL93]: Susan V. Vrbsky, Jane W.-S. Liu: APPROXIMATE - A Query Processor that Produces Monotonically Improving Approximate Answers. IEEE Trans. Knowl. Data Eng. 5(6): 1056-1068(1993) BibTeX
[WVZT90]: Kyu-Young Whang, Brad T. Vander Zanden, Howard M. Taylor: A Linear-Time Probabilistic Counting Algorithm for Database Applications. ACM Trans. Database Syst. 15(2): 208-229(1990) BibTeX

Referenced by

Francesco Buccafurri, Filippo Furfaro, Domenico Saccà: Estimating Range Queries Using Aggregate Data with Integrity Constraints: A Probabilistic Approach. ICDT 2001: 390-404
Chris Olston, Jennifer Widom: Offering a Precision-Performance Tradeoff for Aggregation Queries over Replicated Data. VLDB 2000: 144-155
Venkatesh Ganti, Mong-Li Lee, Raghu Ramakrishnan: ICICLES: Self-Tuning Samples for Approximate Query Answering. VLDB 2000: 176-187
Christopher R. Palmer, Christos Faloutsos: Density Biased Sampling: An Improved Method for Data Mining and Clustering. SIGMOD Conference 2000: 82-92
Dimitrios Gunopulos, George Kollios, Vassilis J. Tsotras, Carlotta Domeniconi: Approximating Multi-Dimensional Aggregate Range Queries over Real Attributes. SIGMOD Conference 2000: 463-474
Swarup Acharya, Phillip B. Gibbons, Viswanath Poosala: Congressional Samples for Approximate Answering of Group-By Queries. SIGMOD Conference 2000: 487-498
Vijayshankar Raman, Bhaskaran Raman, Joseph M. Hellerstein: Online Dynamic Reordering for Interactive Data Processing. VLDB 1999: 709-720
Arnd Christian König, Gerhard Weikum: Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation. VLDB 1999: 423-434
H. V. Jagadish, Olga Kapitskaia, Raymond T. Ng, Divesh Srivastava: Multi-Dimensional Substring Selectivity Estimation. VLDB 1999: 387-398
Yannis E. Ioannidis, Viswanath Poosala: Histogram-Based Approximation of Set-Valued Query-Answers. VLDB 1999: 174-185
Swarup Acharya, Phillip B. Gibbons, Viswanath Poosala: Aqua: A Fast Decision Support Systems Using Approximate Query Answers. VLDB 1999: 754-757
Viswanath Poosala, Venkatesh Ganti: Fast Approximate Answers to Aggregate Queries on a Data Cube. SSDBM 1999: 24-33
Peter J. Haas: Techniques for Online Exploration of Large Object-Relational Datasets. SSDBM 1999: 4-12
Jeffrey Scott Vitter, Min Wang: Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets. SIGMOD Conference 1999: 193-204
Gurmeet Singh Manku, Sridhar Rajagopalan, Bruce G. Lindsay: Random Sampling Techniques for Space Efficient Online Computation of Order Statistics of Large Datasets. SIGMOD Conference 1999: 251-262
Swarup Acharya, Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy: Join Synopses for Approximate Query Answering. SIGMOD Conference 1999: 275-286
Michael Benedikt, Leonid Libkin: Exact and Approximate Aggregation in Constraint Query. PODS 1999: 102-113
Noga Alon, Phillip B. Gibbons, Yossi Matias, Mario Szegedy: Tracking Join and Self-Join Sizes in Limited Storage. PODS 1999: 10-20
Yossi Matias, Jeffrey Scott Vitter, Min Wang: Wavelet-Based Histograms for Selectivity Estimation. SIGMOD Conference 1998: 448-459

BibTeX

ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]