Self-Tuning Histograms: Building Histograms Without Looking at Data

Ashraf Aboulnaga*       Surajit Chaudhuri
University of Wisconsin - Madison       Microsoft Research
ashraf@cs.wisc.edu       surajitc@microsoft.com

Abstract

In this paper, we introduce self-tuning histograms. Although similar in structure to traditional histograms, these histograms infer data distributions not by examining the data or a sample thereof, but by using feedback from the query execution engine about the actual selectivity of range selection operators to progressively refine the histogram. Since the cost of building and maintaining self-tuning histograms is independent of the data size, self-tuning histograms provide a remarkably inexpensive way to construct "approximate" histograms for large data sets with little up-front cost. Self-tuning histograms are particularly attractive as an alternative to higher dimensional traditional histograms that capture correlation of attributes but are expensive to build and maintain. In this paper, we describe the techniques for initializing and refining single as well as multi-dimensional self-tuning histograms. Our experimental results show that the accuracy of self-tuning histograms is close to that of traditional histograms. Thus, this class of histograms provides a "low-cost and reasonably accurate" alternative to traditional histograms.