Selectivity Estimation in Spatial Databases

Swarup Acharya       Viswanath Poosala       Sridhar Ramaswamy*
Bell Labs       Bell Labs       Bell Labs
swarup@research.bell-labs.com       poosala@research.bell-labs.com       sridhar@research.bell-labs.com

Abstract

Selectivity estimation of queries is an important and well-studied problem in relational database systems. In this paper, we examine selectivity estimation in the context of Geographic Information Systems, which manage spatial data such as points, lines, poly-lines and polygons. In particular, we focus on point and range queries over two-dimensional rectangular data. We propose several techniques based on using spatial indices, binary space partitionings (BSPs), and the novel notion of spatial skew. Our techniques carefully partition the input rectangles into subsets in order to approximate the input accurately. We present a detailed experimental study comparing the proposed techniques and the best known sampling and parametric techniques. We evaluate them using synthetic as well as real-life TIGER and Sequoia data sets. Based on our experiments, we identify a BSP based partitioning that we call Min-Skew which consistently provides the most accurate selectivity estimates for spatial queries. The Min-Skew partitioning can be constructed efficiently, occupies very little space, and provides consistently good selectivity estimates.