New Sampling-Based Summary Statistics
for Improving Approximate Query Answers
Phillip B. Gibbons (Bell Laboratories)
Yossi Matias (Tel-Aviv University)
In large data recording and warehousing environments, it is
often advantageous to provide fast, approximate answers to
queries, whenever possible. Before DBMSs providing
highly-accurate approximate answers can become a reality, many
new techniques for summarizing data and for estimating
answers from summarized data must be developed. This paper
introduces two new sampling-based summary statistics,
concise samples and counting samples, and presents new
techniques for their fast incremental maintenance regardless
of the data distribution. We quantify their advantages over
standard sample views in terms of the number of additional
sample points for the same view size, and hence in providing
more accurate query answers. Finally, we consider their
application to providing fast approximate answers to hot list
queries. Our algorithms maintain their accuracy in the
presence of ongoing insertions to the data warehouse.
For more information, see
Phillip B. Gibbons home page
or
Yossi Matias home page.