Digital Symposium Collection 2000  

 
 
 
 
 
 

 





















Selectivity Estimation in Spatial Databases

Swarup Acharya, Viswanath Poosala, and Sridhar Ramaswamy

  View Paper (PDF)  

Return to Spatial Databases

Abstract
Selectivity estimation of queries is an important and well-studied problem in relational database systems. In this paper, we examine selectivity estimation in the context of Geographic Information Systems, which manage spatial data such as points, lines, poly-lines and polygons. In particular, we focus on point and range queries over two-dimensional rectangular data. We propose several techniques based on using spatial indices, histograms, binary space partitionings (BSPs), and the novel notion of spatial skew. Our techniques carefully partition the input rectangles into subsets and approximate each partition accurately. We present a detailed experimental study comparing the proposed techniques and the best known sampling and parametric techniques. We evaluate them using synthetic as well as real-life TIGER datasets. Based on our experiments, we identify a BSP based partitioning that we call Min-Skew which consistently provides the most accurate selectivity estimates for spatial queries. The Min-Skew partitioning can be constructed efficiently, occupies very little space, and provides accurate selectivity estimates over a broad range of spatial queries.


References

Note: References link to DBLP on the Web.

[Ant92]
Gennady Antoshenkov : Random Sampling from Pseudo-Ranked B+ Trees. VLDB 1992 : 375-382
[Aok97]
Paul M. Aoki : Generalizing ``Search'' in Generalized Search Trees (Extended Abstract). ICDE 1998 : 380-389
[APR99]
...
[ARC93]
Environmental Systems Research Institute (Ed.): Understanding GIS - The ARC/INFO Method, 3rd Ed. GeoInformation International / John Wiley 1995, ISBN 1-899761-04-7 and 0-470-24987-0 (USA)
[BF95]
Alberto Belussi , Christos Faloutsos : Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension. VLDB 1995 : 299-310
[BKSS90]
Norbert Beckmann , Hans-Peter Kriegel , Ralf Schneider , Bernhard Seeger : The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. SIGMOD Conference 1990 : 322-331
[CR94]
Chungmin Melvin Chen , Nick Roussopoulos : Adaptive Selectivity Estimation Using Query Feedback. SIGMOD Conference 1994 : 161-172
[DeW94]
David J. DeWitt , Navin Kabra , Jun Luo , Jignesh M. Patel , Jie-Bing Yu : Client-Server Paradise. VLDB 1994 : 558-569
[GG82]
Erol Gelenbe , Danièle Gardy : The Size of Projections of Relations Satisfying a Functional Dependency. VLDB 1982 : 325-333
[GRSS97]
Stéphane Grumbach , Philippe Rigaux , Michel Scholl , Luc Segoufin : DEDALE, A Spatial Constraint Database. DBPL 1997 : 38-59
[Gut85]
Antonin Guttman : R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD Conference 1984 : 47-57
[HNSS95]
Peter J. Haas , Jeffrey F. Naughton , S. Seshadri , Lynne Stokes : Sampling-Based Estimation of the Number of Distinct Values of an Attribute. VLDB 1995 : 311-322
[Int97]
...
[KMS97]
Sanjeev Khanna , S. Muthukrishnan , Steven Skiena : Efficient Array Partitioning. ICALP 1997 : 616-626
[Koo80]
Robert Kooi : The Optimization of Queries in Relational Databases. Ph.D. thesis, Case Western Reserve University 1980
[LNS90]
Richard J. Lipton , Jeffrey F. Naughton , Donovan A. Schneider : Practical Selectivity Estimation through Adaptive Sampling. SIGMOD Conference 1990 : 1-11
[Map98]
...
[MPS99]
S. Muthukrishnan , Viswanath Poosala , Torsten Suel : On Rectangular Partitionings in Two Dimensions: Algorithms, Complexity, and Applications. ICDT 1999 : 236-256
[PI97]
Viswanath Poosala , Yannis E. Ioannidis : Selectivity Estimation Without the Attribute Value Independence Assumption. VLDB 1997 : 486-495
[PIHS96]
Viswanath Poosala , Yannis E. Ioannidis , Peter J. Haas , Eugene J. Shekita : Improved Histograms for Selectivity Estimation of Range Predicates. SIGMOD Conf. 1996 : 294-305
[Poo97]
Viswanath Poosala : Histogram-Based Estimation Techniques in Database Systems. Ph.D. thesis, Univ. of Wisconsin-Madison 1997
[PSC84]
Gregory Piatetsky-Shapiro , Charles Connell : Accurate Estimation of the Number of Tuples Satisfying a Condition. SIGMOD Conference 1984 : 256-276
[SAC+79]
Patricia G. Selinger , Morton M. Astrahan , Donald D. Chamberlin , Raymond A. Lorie , Thomas G. Price : Access Path Selection in a Relational Database Management System. SIGMOD Conference 1979 : 23-34
[Sam89a]
...
[Sam89b]
Hanan Samet : The Design and Analysis of Spatial Data Structures. Addison-Wesley 1990
[SFGM93]
Michael Stonebraker , James Frew , Kenn Gardels , Jeff Meredith : The Sequoia 2000 Benchmark. SIGMOD Conference 1993 : 2-11
[SRF87]
Timos K. Sellis , Nick Roussopoulos , Christos Faloutsos : The R+-Tree: A Dynamic Index for Multi-Dimensional Objects. VLDB 1987 : 507-518
[tig92]
...
[TS96]
Yannis Theodoridis , Timos K. Sellis : A Model for the Prediction of R-tree Performance. PODS 1996 : 161-171
[Ube94]
Michael Ubell : The Montage Extensible DataBlade Achitecture. SIGMOD Conference 1994 : 482
[Zip49]
George Kingsley Zipf: Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology. Addison-Wesley 1949

BIBTEX

@inproceedings{DBLP:conf/sigmod/AcharyaPR99,
  author    = {Swarup Acharya and
                Viswanath Poosala and
                Sridhar Ramaswamy},
   editor    = {Alex Delis and
                Christos Faloutsos and
                Shahram Ghandeharizadeh},
   title     = {Selectivity Estimation in Spatial Databases},
   booktitle = {SIGMOD 1999, Proceedings ACM SIGMOD International Conference
                on Management of Data, June 1-3, 1999, Philadephia, Pennsylvania,
                USA},
   publisher = {ACM Press},
   year      = {1999},
   isbn      = {1-58113-084-8},
   pages     = {13-24},
   crossref  = {DBLP:conf/sigmod/99},
   bibsource = {DBLP, http://dblp.uni-trier.de} } },


























Copyright(C) 2000 ACM