Digital Symposium Collection 2000  

 
 
 
 
 
 

 





















Probabilistic Optimization of Top N Queries

Donko Donjerkovic and Raghu Ramakrishnan

  View Paper (PDF)  

Return to Partial Query Evaluation

Abstract
The problem of finding the best answers to a query quickly, rather than finding all answers, is of increasing importance as relational databases are applied in multimedia and decision-support domains. An approach to efficiently answering such "Top N" queries is to augment the query with an additional selection that prunes away the unwanted portion of the answer set. The risk is that if the selection returns fewer than the desired number of answers, the execution must be restarted (with a less selective filter). We propose a new, probabilistic approach to query optimization that quantifies this risk and seeks to minimize overall cost including the cost of possible restarts. We also present an experimental study to demonstrate that probabilistic Top N query optimization can significantly reduce the average query execution time with relatively modest increase in the optimization time.


References

Note: References link to DBLP on the Web.

[1]
Michael J. Carey , Donald Kossmann : Reducing the Braking Distance of an SQL Query Engine. VLDB 1998 : 158-169
[2]
Francis Chu , Joseph Y. Halpern , Praveen Seshadri : Least Expected Cost Query Optimization: An Exercise in Utility. PODS 1999 : 138-147
[3]
Min Fang , Narayanan Shivakumar , Hector Garcia-Molina , Rajeev Motwani , Jeffrey D. Ullman : Computing Iceberg Queries Efficiently. VLDB 1998 : 299-310
[4]
Yannis E. Ioannidis , Stavros Christodoulakis : On the Propagation of Errors in the Size of Join Results. SIGMOD Conference 1991 : 268-277
[5]
H. V. Jagadish , Nick Koudas , S. Muthukrishnan , Viswanath Poosala , Kenneth C. Sevcik , Torsten Suel : Optimal Histograms with Quality Guarantees. VLDB 1998 : 275-286
[6]
Joseph M. Hellerstein , Peter J. Haas , Helen Wang : Online Aggregation. SIGMOD Conference 1997 : 171-182
[7]
...
[8]
Alon Y. Levy , Anand Rajaraman , Joann J. Ordille : Querying Heterogeneous Information Sources Using Source Descriptions. VLDB 1996 : 251-262
[9]
Michael J. Carey , Donald Kossmann : On Saying "Enough Already!" in SQL. SIGMOD Conference 1997 : 219-230
[10]
Viswanath Poosala , Yannis E. Ioannidis , Peter J. Haas , Eugene J. Shekita : Improved Histograms for Selectivity Estimation of Range Predicates. SIGMOD Conf. 1996 : 294-305
[11]
...
[12]
Leonard D. Shapiro : Join Processing in Database Systems with Large Main Memories. TODS 11(3) : 239-264(1986)
[13]
Surajit Chaudhuri , Rajeev Motwani , Vivek R. Narasayya : Random Sampling for Histogram Construction: How much is enough? SIGMOD Conference 1998 : 436-447
[14]
Tolga Urhan , Michael J. Franklin , Laurent Amsaleg : Cost Based Query Scrambling for Initial Delays. SIGMOD Conference 1998 : 130-141
[15]
William H. Press, Saul A. Teukolsky, William T. Vetterling, Brian P. Flannery: Numerical Recipes in C, 2nd Edition. Cambridge University Press 1992
Contents
[16]
George Kingsley Zipf: Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology. Addison-Wesley 1949

BIBTEX

@inproceedings{DBLP:conf/vldb/DonjerkovicR99,
  author    = {Donko Donjerkovic and
                Raghu Ramakrishnan},
   editor    = {Malcolm P. Atkinson and
                Maria E. Orlowska and
                Patrick Valduriez and
                Stanley B. Zdonik and
                Michael L. Brodie},
   title     = {Probabilistic Optimization of Top N Queries},
   booktitle = {VLDB'99, Proceedings of 25th International Conference on Very
                Large Data Bases, September 7-10, 1999, Edinburgh, Scotland,
                UK},
   publisher = {Morgan Kaufmann},
   year      = {1999},
   isbn      = {1-55860-615-5},
   pages     = {411-422},
   crossref  = {DBLP:conf/vldb/99},
   bibsource = {DBLP, http://dblp.uni-trier.de} } },


























Copyright(C) 2000 ACM