We present a new family of join algorithms, called ripple joins, for online
processing of multi-table aggregation queries in a relational database
management system (DBMS). Such queries arise naturally in interactive
exploratory decision-support applications. Traditional offline join
algorithms are designed to minimize the time to completion of the query. In
contrast, ripple joins are designed to minimize the time until an acceptably
precise estimate of the query result is available, as measured by the length
of a confidence interval. This time is independent of the size of the input
relations. Ripple joins are adaptive, adjusting their behavior during
processing in accordance with the statistical properties of the data.
Ripple joins also permit the user to dynamically trade off the two key
performance factors of online aggregation: the time between successive
updates of the running aggregate, and the amount by which the
confidence-interval length decreases at each update. We show how ripple
joins can be implemented in an existing DBMS using iterators, and we give an
overview of the methods used to compute confidence intervals and to
adaptively optimize the ripple join ``aspect-ratio'' parameters. In
experiments with an initial implementation of our algorithms in the Postgres
DBMS, the time required to produce reasonably precise online estimates was
up to two orders of magnitude smaller than the time required for the best
offline join algorithms to produce exact answers.